Thursday 9 February 2012

BOINC Server Complex and Scheduling

The Berkeley Open Infrastructure for Network Computing provides a generic
framework for implementing distributed computation applications within a
heterogeneous environment. This system is currently deployed in a beta test
for the SETI@home and Astropulse  distributed applications. The system is
designed as a software platform utilizing computing resources from volunteer
computers. Computing resources are allocated to the computational problem
by the assignment of work units. Workunits are a subset of the entire
computational problem. Work units for the MD4 hashing problem were
chosen to create work units small enough to evenly distribute work among
the participating nodes and large enough to minimize start up and reporting
times. For the four participating nodes in the project, the entire password
space of all five character passwords were divided into 47 even work units.
This allowed for an even division of workunits containing 94^2 × 2 possible
passwords. Work unit input files contained three fields. The first field contained
the first password to check. The second field contained the last password to
check. The final field contained the matching hash target.

The BOINC architecture consists of a server complex handling scheduling,
central result processing, and participant account management, a core software
client running on participant nodes, and project specific client software running
on participant nodes. The server complex consists of a database, web server,
and five BOINC specific processes. Communication between participant
nodes and the server complex is handled via standard HTTP. Database
functions in BOINC are implemented for MySQL; however few, if any,
MySQL specific functions are used so it can be easily replaced by a comparable
database. A straightforward schema is used to store information on application
versions, participant hosts, user accounts, work units, and results. Additional
tables are included for tracking user participation level and managing teams.

BOINC uses HTTP for both interaction with the user for account management and
RPC communication between client nodes and the server complex. User account
management is accessed using a standard web browser to interact with a web site
implemented mainly in PHP. This web site allows users to manage preferences for job
execution of BOINC projects. RPC communication between the BOINC core client
and the server complex is handled by two cgi programs. The first cgi program, named
cgi, handles work scheduling based on client requests for work. The second program,
named file_upload_handler, receives completed work results from clients for
processing by the server complex. The MD4 password hashing project used the
standard web interface, cgi, and file_upload_handler from a standard BOINC
installation. The first BOINC process is named feeder. Feeder is responsible for
accessing the database and selecting candidate work units to be processed by
clients. The next BOINC process is the assimilator. Assimilator processes work unit
results and integrates them into the project. As valid results become available, as
similator does any project specific processing including storage of results in the
project database. Transitioner is the third BOINC process. This process handles
state transistions of work units. The fourth BOINC process is the validator.
Validator handles testing the validity of work unit results. Result validation comes
into play when redundant computing is used to compare the results of the same
work unit returned by two or more clients. The final BOINC process is the
file_deleter. File_deleter examines the state of work units to determine which
input and output files are no longer needed.

Scheduling workunits is handled by the cgi interfaces and the BOINC server
processes. Participating nodes contact scheduling server through the web server
interface named cgi. The participant node identifies itself and amount of work
effort available based on high and low water marks for processing time, available
memory, and disk space. Cgi then selects viable workunits from the shared
memory segment filled by the feeder process. Cgi bases workunit selection
based upon client capabilities and also favoring work units that have been
evaluated as infeasible by other participating nodes. The available work units
returned by feeder will be affected by work unit configurations requiring
redundant computing and result validation. Work units that have already been
processed may need additional results from other participating nodes in order
to arrive at a quorum of identical results in order to accept a result unit. The
MD4 password hashing project did not implement redundant computing.

Configuration of the MD4 password hashing project manipulated the estimated
running time of the individual workunits in order to accommodate the scheduling
algorithm. Since the entire MD4 password search required a running time
shorter than the default low water mark for the default BOINC project client, the
first node requesting work would consume all available work units. To circumvent
this scheduling problem, the estimated time of the work units reported running
times far greater than the actual running time. This caused the scheduling algorithm
to distribute the work units more evenly across the participating nodes.

No comments:

Post a Comment