BDMPI - Big Data Message Passing Interface
Release 0.1
|
BDMPI programs are message-passing distributed memory parallel programs written using a subset of MPI's API. As such, BDMPI program development is nearly identical to developing MPI-based parallel programs. There are many online resources providing guidelines and documentation for developing MPI-based programs, all of which apply for BDMPI as well.
Though it is beyond the scope of this documentation to provide a tutorial on how to develop MPI programs, the following is a list of items that anybody using BDMPI should be aware off:
mpi.h
or bdmpi.h
header file and must call MPI_Init()
and MPI_Finalize()
before and after finishing with their work.MPI_Init()
and MPI_Finalize()
should be the first and last functions called in the main()
, respectively.main()
function by returning EXIT_SUCCESS
.BDMPI implements a subset of the MPI specification that includes functions for querying and creating communicators and for performing point-to-point and collective communications. For each of these functions, BDMPI provides a variant that starts with the MPI_
prefix and a variant that starts with the BDMPI_
prefix. The calling sequence of the first variant is identical to the MPI specification whereas the calling sequence of the second variant has been modified to make it 64-bit compliant (e.g., replaced most of the sizes that MPI assumed that were int
to either size_t
or ssize_t
).
Since the calling sequence of these functions is the same as that specified in MPI's specification (available at http://www.mpi-forum.org) it is not included here.
BDMPI provides a small set of additional functions that an application can use to improve its out-of-core execution performance and get information about how the different MPI slave processes are distributed among the nodes. Since these functions are not part of the MPI specification, their names only start with the BDMPI_
prefix.
These functions are organized in three main groups that are described in the following subsections.
Since BDMPI organizes the MPI processes into groups of slave processes, each running on a different node, it is sometimes beneficial for the application to know how many slaves within a communicator are running on the same node and the total number of nodes that is involved. To achieve this, BDMPI provides a set of functions that can be used to interrogate each communicator in order to obtain information that relates to the number of slaves per node, the rank of a process among the other processes in its own node, the number of nodes, and their ranks.
In addition, BDMPI defines some additional predefined communicators that are used to describe the processes of each node. These are described in Predefined communicators.
On a system with a quad core processor it may be reasonable to allow 3-4 slaves to run concurrently. However, if that system's I/O subsystem consisted of only a single disk, then in order to prevent I/O contention, it may be beneficial to limit the number of slaves that perform I/O. To achieve such a synchronization, BDMPI provides a set of functions that can be used to implement mutex-like synchronization among the processes running on the same slave node. The number of slave processes that can be at a critical section concurrently is controlled by the -nc
option of bdmprun
(Options of bdmprun).
BDMPI exposes various functions that relate to its sbmalloc subsystem (Efficient loading & saving of a process's address space). An application can use these functions to explicitly allocate sbmalloc-handled memory and to also force loading/saving of memory regions that were previously allocated by sbmalloc (see the discussion in Issues related to sbmalloc).
The following are the communicators that are predefined in BDMPI. Note that both the ones with the MPI_
and the BDMPI_
prefix are defined.
MPI Name | BDMPI Name | Description |
---|---|---|
MPI_COMM_WORLD | BDMPI_COMM_WORLD | Contains all processes |
MPI_COMM_SELF | BDMPI_COMM_SELF | Contains only the calling process |
MPI_COMM_NULL | BDMPI_COMM_NULL | A null group |
N/A | BDMPI_COMM_NODE | Contains all processes in a node |
N/A | BDMPI_COMM_CWORLD | Contains all processes in cyclic order |
The last two communicators are BDMPI specific. BDMPI_COMM_NODE
is used to describe the group of slave processes that were spawned by the same master process (i.e., bdmprun
). Unless multiple instances of bdmprun
was started on a single node, these processes will be the ones running on a node for the program.
The ranks of the processes in the MPI_COMM/WORLD/BDMPI_COMM_WORLD
communicator are ordered in increasing order based on the rank of the hosts specified in mpiexec's
hostfile
and the number of slaves spawned by bdmprun
. For example, the execution of the helloworld
program (see Running BDMPI Programs):
mpiexec -hostfile ~/machines -np 3 bdmprun -ns 4 build/Linux-x86_64/test/helloworld
with the hostfile:
will create an MPI_COMM_WORLD/BDMPI_COMM_WORLD
communicator in which the processes with ranks 0–3 are the slave processes on bd1-umh
, ranks 4–7 are the slave processes on bd2-umh
, and ranks 8–11 are the slave processes on bd3-umh
. The BDMPI_COMM_CWORLD
communicator also includes all processes but assignes ranks in a cyclic fashion based on the hosts specified in mpiexec's
hostfile. For example, the ranks of the four slaves on bd1-umh
for the above helloworld
execution will be 0, 3, 6, and 9; on bd2-umh
will be 1, 4, 7, and 10; and on bd3-umh
will be 2, 5, 8, and 11.
The following are the datatypes that are predefined in BDMPI. Note that both the ones with the MPI_
and the BDMPI_
prefix are defined.
MPI Name | BDMPI Name | C equivalent |
---|---|---|
MPI_CHAR | BDMPI_CHAR | char |
MPI_SIGNED_CHAR | BDMPI_SIGNED_CHAR | signed char |
MPI_UNSIGNED_CHAR | BDMPI_UNSIGNED_CHAR | unsigned char |
MPI_BYTE | BDMPI_BYTE | unsigned char |
MPI_WCHAR | BDMPI_WCHAR | wchar_t |
MPI_SHORT | BDMPI_SHORT | short |
MPI_UNSIGNED_SHORT | BDMPI_UNSIGNED_SHORT | unsigned short |
MPI_INT | BDMPI_INT | int |
MPI_UNSIGNED | BDMPI_UNSIGNED | unsigned int |
MPI_LONG | BDMPI_LONG | long |
MPI_UNSIGNED_LONG | BDMPI_UNSIGNED_LONG | unsigned long |
MPI_LONG_LONG_INT | BDMPI_LONG_LONG_INT | long long int |
MPI_UNSIGNED_LONG_LONG | BDMPI_UNSIGNED_LONG_LONG | unsigned long long int |
MPI_INT8_T | BDMPI_INT8_T | int8_t |
MPI_UINT8_T | BDMPI_UINT8_T | uint8_t |
MPI_INT16_T | BDMPI_INT16_T | int16_t |
MPI_UINT16_T | BDMPI_UINT16_T | uint16_t |
MPI_INT32_T | BDMPI_INT32_T | int32_t |
MPI_UINT32_T | BDMPI_UINT32_T | uint32_t |
MPI_INT64_T | BDMPI_INT64_T | int64_t |
MPI_UINT64_T | BDMPI_UINT64_T | uint64_t |
MPI_SIZE_T | BDMPI_SIZE_T | size_t |
MPI_SSIZE_T | BDMPI_SSIZE_T | ssize_t |
MPI_FLOAT | BDMPI_FLOAT | float |
MPI_DOUBLE | BDMPI_DOUBLE | double |
MPI_FLOAT_INT | BDMPI_FLOAT_INT | struct {float, int} |
MPI_DOUBLE_INT | BDMPI_DOUBLE_INT | struct {double, int} |
MPI_LONG_INT | BDMPI_LONG_INT | struct {long, int} |
MPI_SHORT_INT | BDMPI_SHORT_INT | struct {short, int} |
MPI_2INT | BDMPI_2INT | struct {int, int} |
The following are the reduction operations that are supported by BDMPI. Note that both the ones with the MPI_
and the BDMPI_
prefix are defined.
MPI Name | BDMPI Name | Description |
---|---|---|
MPI_OP_NULL | BDMPI_OP_NULL | null |
MPI_MAX | BDMPI_MAX | max reduction |
MPI_MIN | BDMPI_MIN | min reduction |
MPI_SUM | BDMPI_SUM | sum reduction |
MPI_PROD | BDMPI_PROD | prod reduction |
MPI_LAND | BDMPI_LAND | logical and reduction |
MPI_BAND | BDMPI_BAND | boolean and reduction |
MPI_LOR | BDMPI_LOR | logical or reduction |
MPI_BOR | BDMPI_BOR | boolean or reduction |
MPI_LXOR | BDMPI_LXOR | logical xor reduction |
MPI_BXOR | BDMPI_BXOR | boolean xor reduction |
MPI_MAXLOC | BDMPI_MAXLOC | max value and location reduction |
MPI_MINLOC | BDMPI_MINLOC | min value and location reduction |