Ubuntu Manpage: fi_msg - Message data transfer operations

Provided by: libfabric-dev_1.17.0-3build2_amd64

NAME

       fi_msg - Message data transfer operations

       fi_recv / fi_recvv / fi_recvmsg
              Post a buffer to receive an incoming message

       fi_send / fi_sendv / fi_sendmsg fi_inject / fi_senddata : Initiate an operation to send a message

SYNOPSIS

              #include <rdma/fi_endpoint.h>

              ssize_t fi_recv(struct fid_ep *ep, void * buf, size_t len,
                  void *desc, fi_addr_t src_addr, void *context);

              ssize_t fi_recvv(struct fid_ep *ep, const struct iovec *iov, void **desc,
                  size_t count, fi_addr_t src_addr, void *context);

              ssize_t fi_recvmsg(struct fid_ep *ep, const struct fi_msg *msg,
                  uint64_t flags);

              ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
                  void *desc, fi_addr_t dest_addr, void *context);

              ssize_t fi_sendv(struct fid_ep *ep, const struct iovec *iov,
                  void **desc, size_t count, fi_addr_t dest_addr, void *context);

              ssize_t fi_sendmsg(struct fid_ep *ep, const struct fi_msg *msg,
                  uint64_t flags);

              ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
                  fi_addr_t dest_addr);

              ssize_t fi_senddata(struct fid_ep *ep, const void *buf, size_t len,
                  void *desc, uint64_t data, fi_addr_t dest_addr, void *context);

              ssize_t fi_injectdata(struct fid_ep *ep, const void *buf, size_t len,
                  uint64_t data, fi_addr_t dest_addr);

ARGUMENTS

       ep     Fabric endpoint on which to initiate send or post receive buffer.

       buf    Data buffer to send or receive.

       len    Length of data buffer to send or receive, specified in bytes.  Valid transfers are from 0 bytes up
              to the endpoint’s max_msg_size.

       iov    Vectored data buffer.

       count  Count of vectored data entries.

       desc   Descriptor associated with the data buffer.  See fi_mr(3).

       data   Remote CQ data to transfer with the sent message.

       dest_addr
              Destination address for connectionless transfers.  Ignored for connected endpoints.

       src_addr
              Source  address to receive from for connectionless transfers.  Applies only to connectionless end‐
              points with the FI_DIRECTED_RECV capability enabled, otherwise this field is ignored.  If  set  to
              FI_ADDR_UNSPEC, any source address may match.

       msg    Message descriptor for send and receive operations.

       flags  Additional flags to apply for the send or receive operation.

       context
              User  specified  pointer to associate with the operation.  This parameter is ignored if the opera‐
              tion will not generate a successful completion, unless an op flag specifies the context  parameter
              be used for required input.

DESCRIPTION

       The  send  functions – fi_send, fi_sendv, fi_sendmsg, fi_inject, and fi_senddata – are used to transmit a
       message from one endpoint to another endpoint.  The main difference between send functions are the number
       and type of parameters that they accept as input.  Otherwise, they perform  the  same  general  function.
       Messages  sent  using fi_msg operations are received by a remote endpoint into a buffer posted to receive
       such messages.

       The receive functions – fi_recv, fi_recvv, fi_recvmsg – post a data buffer to an endpoint to receive  in‐
       bound messages.  Similar to the send operations, receive operations operate asynchronously.  Users should
       not touch the posted data buffer(s) until the receive operation has completed.

       An  endpoint  must  be enabled before an application can post send or receive operations to it.  For con‐
       nected endpoints, receive buffers may be posted prior to connect or accept being called on the  endpoint.
       This  ensures  that  buffers  are available to receive incoming data immediately after the connection has
       been established.

       Completed message operations are reported to the user through one or  more  event  collectors  associated
       with  the  endpoint.   Users provide context which are associated with each operation, and is returned to
       the user as part of the event completion.  See fi_cq for completion event details.

   fi_send
       The call fi_send transfers the data contained in the user-specified data buffer  to  a  remote  endpoint,
       with message boundaries being maintained.

   fi_sendv
       The  fi_sendv  call adds support for a scatter-gather list to fi_send.  The fi_sendv transfers the set of
       data buffers referenced by the iov parameter to a remote endpoint as a single message.

   fi_sendmsg
       The fi_sendmsg call supports data transfers over both connected and connectionless  endpoints,  with  the
       ability to control the send operation per call through the use of flags.  The fi_sendmsg function takes a
       struct fi_msg as input.

              struct fi_msg {
                  const struct iovec *msg_iov; /* scatter-gather array */
                  void               **desc;   /* local request descriptors */
                  size_t             iov_count;/* # elements in iov */
                  fi_addr_t          addr;     /* optional endpoint address */
                  void               *context; /* user-defined context */
                  uint64_t           data;     /* optional message data */
              };

   fi_inject
       The  send  inject  call  is an optimized version of fi_send with the following characteristics.  The data
       buffer is available for reuse immediately on return from the call, and no CQ entry will be written if the
       transfer completes successfully.

       Conceptually, this means that the fi_inject function behaves as if the FI_INJECT transfer flag were  set,
       selective  completions  are enabled, and the FI_COMPLETION flag is not specified.  Note that the CQ entry
       will be suppressed even if the default behavior of the endpoint is to write CQ entries for all successful
       completions.  See the flags discussion below for more details.  The requested message size  that  can  be
       used with fi_inject is limited by inject_size.

   fi_senddata
       The  send  data  call  is  similar  to  fi_send, but allows for the sending of remote CQ data (see FI_RE‐
       MOTE_CQ_DATA flag) as part of the transfer.

   fi_injectdata
       The inject data call is similar to fi_inject, but allows for the sending of remote CQ  data  (see  FI_RE‐
       MOTE_CQ_DATA flag) as part of the transfer.

   fi_recv
       The fi_recv call posts a data buffer to the receive queue of the corresponding endpoint.  Posted receives
       are  searched  in  the  order  in which they were posted in order to match sends.  Message boundaries are
       maintained.  The order in which the receives complete is dependent on the  endpoint  type  and  protocol.
       For  connectionless  endpoints,  the  src_addr  parameter can be used to indicate that a buffer should be
       posted to receive incoming data from a specific remote endpoint.

   fi_recvv
       The fi_recvv call adds support for a scatter-gather list to fi_recv.  The fi_recvv posts the set of  data
       buffers referenced by the iov parameter to a receive incoming data.

   fi_recvmsg
       The  fi_recvmsg  call supports posting buffers over both connected and connectionless endpoints, with the
       ability to control the receive operation per call through the use  of  flags.   The  fi_recvmsg  function
       takes a struct fi_msg as input.

FLAGS

The fi_recvmsg and fi_sendmsg calls allow the user to specify flags which can change the default message
handling of the endpoint. Flags specified with fi_recvmsg / fi_sendmsg override most flags previously
configured with the endpoint, except where noted (see fi_endpoint.3). The following list of flags are
usable with fi_recvmsg and/or fi_sendmsg.

FI_REMOTE_CQ_DATA
Applies to fi_sendmsg and fi_senddata. Indicates that remote CQ data is available and should be
sent as part of the request. See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.

FI_CLAIM
Applies to posted receive operations for endpoints configured for FI_BUFFERED_RECV or FI_VARI‐
ABLE_MSG. This flag is used to retrieve a message that was buffered by the provider. See the
Buffered Receives section for details.

FI_COMPLETION
Indicates that a completion entry should be generated for the specified operation. The endpoint
must be bound to a completion queue with FI_SELECTIVE_COMPLETION that corresponds to the specified
operation, or this flag is ignored.

FI_DISCARD
Applies to posted receive operations for endpoints configured for FI_BUFFERED_RECV or FI_VARI‐
ABLE_MSG. This flag is used to free a message that was buffered by the provider. See the
Buffered Receives section for details.

FI_MORE
Indicates that the user has additional requests that will immediately be posted after the current
call returns. Use of this flag may improve performance by enabling the provider to optimize its
access to the fabric hardware.

FI_INJECT
Applies to fi_sendmsg. Indicates that the outbound data buffer should be returned to user immedi‐
ately after the send call returns, even if the operation is handled asynchronously. This may re‐
quire that the underlying provider implementation copy the data into a local buffer and transfer
out of that buffer. This flag can only be used with messages smaller than inject_size.

FI_MULTI_RECV
Applies to posted receive operations. This flag allows the user to post a single buffer that will
receive multiple incoming messages. Received messages will be packed into the receive buffer un‐
til the buffer has been consumed. Use of this flag may cause a single posted receive operation to
generate multiple events as messages are placed into the buffer. The placement of received data
into the buffer may be subjected to provider specific alignment restrictions.

The buffer will be released by the provider when the available buffer space falls below the specified
minimum (see FI_OPT_MIN_MULTI_RECV). Note that an entry to the associated receive completion queue will
always be generated when the buffer has been consumed, even if other receive completions have been sup‐
pressed (i.e. the Rx context has been configured for FI_SELECTIVE_COMPLETION). See the FI_MULTI_RECV
completion flag fi_cq(3).

FI_INJECT_COMPLETE
Applies to fi_sendmsg. Indicates that a completion should be generated when the source buffer(s)
may be reused.

FI_TRANSMIT_COMPLETE
Applies to fi_sendmsg and fi_recvmsg. For sends, indicates that a completion should not be gener‐
ated until the operation has been successfully transmitted and is no longer being tracked by the
provider. For receive operations, indicates that a completion may be generated as soon as the
message has been processed by the local provider, even if the message data may not be visible to
all processing elements. See fi_cq(3) for target side completion semantics.

FI_DELIVERY_COMPLETE
Applies to fi_sendmsg. Indicates that a completion should be generated when the operation has
been processed by the destination.

FI_FENCE
Applies to transmits. Indicates that the requested operation, also known as the fenced operation,
and any operation posted after the fenced operation will be deferred until all previous operations
targeting the same peer endpoint have completed. Operations posted after the fencing will see
and/or replace the results of any operations initiated prior to the fenced operation.

The ordering of operations starting at the posting of the fenced operation (inclusive) to the posting of
a subsequent fenced operation (exclusive) is controlled by the endpoint’s ordering semantics.

FI_MULTICAST
Applies to transmits. This flag indicates that the address specified as the data transfer desti‐
nation is a multicast address. This flag must be used in all multicast transfers, in conjunction
with a multicast fi_addr_t.

Buffered Receives

       Buffered  receives  indicate that the networking layer allocates and manages the data buffers used to re‐
       ceive network data transfers.  As a result, received messages must be copied from the network buffers in‐
       to application buffers for processing.  However, applications can avoid this copy if  they  are  able  to
       process the message in place (directly from the networking buffers).

       Handling buffered receives differs based on the size of the message being sent.  In general, smaller mes‐
       sages are passed directly to the application for processing.  However, for large messages, an application
       will  only  receive the start of the message and must claim the rest.  The details for how small messages
       are reported and large messages may be claimed are described below.

       When a provider receives a message, it will write an entry to the completion queue  associated  with  the
       receiving  endpoint.   For  discussion  purposes,  the  completion  queue is assumed to be configured for
       FI_CQ_FORMAT_DATA.  Since buffered receives are not associated with application posted  buffers,  the  CQ
       entry op_context will point to a struct fi_recv_context.

              struct fi_recv_context {
                  struct fid_ep *ep;
                  void *context;
              };

       The  `ep'  field  will point to the receiving endpoint or Rx context, and `context' will be NULL.  The CQ
       entry’s `buf' will point to a provider managed buffer where the start of the received message is located,
       and `len' will be set to the total size of the message.

       The maximum sized message that a provider can  buffer  is  limited  by  an  FI_OPT_BUFFERED_LIMIT.   This
       threshold can be obtained and may be adjusted by the application using the fi_getopt and fi_setopt calls,
       respectively.   Any  adjustments  must  be  made prior to enabling the endpoint.  The CQ entry `buf' will
       point to a buffer of received data.  If the sent message is larger than the buffered amount, the CQ entry
       `flags' will have the FI_MORE bit set.  When the FI_MORE bit  is  set,  `buf'  will  reference  at  least
       FI_OPT_BUFFERED_MIN bytes of data (see fi_endpoint.3 for more info).

       After  being  notified that a buffered receive has arrived, applications must either claim or discard the
       message.  Typically, small messages are processed and discarded, while large messages are claimed.   How‐
       ever, an application is free to claim or discard any message regardless of message size.

       To  claim a message, an application must post a receive operation with the FI_CLAIM flag set.  The struct
       fi_recv_context returned as part of the notification must be provided as the receive operation’s context.
       The struct fi_recv_context contains a `context' field.  Applications  may  modify  this  field  prior  to
       claiming  the  message.   When the claim operation completes, a standard receive completion entry will be
       generated on the completion queue.  The `context' of the associated CQ entry will be set to the `context'
       value passed in through the fi_recv_context structure, and the CQ entry flags will have the FI_CLAIM  bit
       set.

       Buffered  receives  that  are not claimed must be discarded by the application when it is done processing
       the CQ entry data.  To discard a message, an application must post a receive operation with  the  FI_DIS‐
       CARD  flag  set.  The struct fi_recv_context returned as part of the notification must be provided as the
       receive operation’s context.  When the FI_DISCARD flag is set for a receive operation, the receive  input
       buffer(s) and length parameters are ignored.

       IMPORTANT:  Buffered  receives must be claimed or discarded in a timely manner.  Failure to do so may re‐
       sult in increased memory usage for network buffering or communication stalls.  Once  a  buffered  receive
       has  been  claimed or discarded, the original CQ entry `buf' or struct fi_recv_context data may no longer
       be accessed by the application.

       The use of the FI_CLAIM and FI_DISCARD operation flags is also described with respect to  tagged  message
       transfers  in  fi_tagged.3.  Buffered receives of tagged messages will include the message tag as part of
       the CQ entry, if available.

       The handling of buffered receives follows all message ordering restrictions assigned to an endpoint.  For
       example, completions may indicate the order in which received messages arrived at the receiver  based  on
       the endpoint attributes.

Variable Length Messages

       Variable length messages, or simply variable messages, are transfers where the size of the message is un‐
       known to the receiver prior to the message being sent.  It indicates that the recipient of a message does
       not  know  the amount of data to expect prior to the message arriving.  It is most commonly used when the
       size of message transfers varies greatly, with very large messages interspersed with  much  smaller  mes‐
       sages,  making  receive side message buffering difficult to manage.  Variable messages are not subject to
       max message length restrictions (i.e. struct fi_ep_attr::max_msg_size limits), and may be up to the maxi‐
       mum value of size_t (e.g. SIZE_MAX) in length.

       Variable length messages support requests that the provider  allocate  and  manage  the  network  message
       buffers.   As  a result, the application requirements and provider behavior is identical as those defined
       for supporting the FI_BUFFERED_RECV mode bit.  See the Buffered Receive section above for  details.   The
       main  difference is that buffered receives are limited by the fi_ep_attr::max_msg_size threshold, whereas
       variable length messages are not.

       Support for variable messages is indicated through the FI_VARIABLE_MSG capability bit.

NOTES

       If an endpoint has been configured with FI_MSG_PREFIX, the application must include buffer space of  size
       msg_prefix_size,  as  specified by the endpoint attributes.  The prefix buffer must occur at the start of
       the data referenced by the buf parameter, or be referenced by the first IO vector.  Message prefix  space
       cannot be split between multiple IO vectors.  The size of the prefix buffer should be included as part of
       the total buffer length.

RETURN VALUE

       Returns  0 on success.  On error, a negative value corresponding to fabric errno is returned.  Fabric er‐
       rno values are defined in rdma/fi_errno.h.

       See the discussion below for details handling FI_EAGAIN.

ERRORS

       -FI_EAGAIN
              Indicates that the underlying provider currently lacks the resources needed to  initiate  the  re‐
              quested  operation.   The  reasons for a provider returning FI_EAGAIN are varied.  However, common
              reasons include insufficient internal buffering or full processing queues.

       Insufficient internal buffering is often associated with operations that use FI_INJECT.  In  such  cases,
       additional buffering may become available as posted operations complete.

       Full processing queues may be a temporary state related to local processing (for example, a large message
       is  being  transferred), or may be the result of flow control.  In the latter case, the queues may remain
       blocked until additional resources are made available at the remote side of the transfer.

       In all cases, the operation may be retried after additional resources become available.  It  is  strongly
       recommended  that  applications check for transmit and receive completions after receiving FI_EAGAIN as a
       return value, independent of the operation which failed.  This is particularly important in  cases  where
       manual progress is employed, as acknowledgements or flow control messages may need to be processed in or‐
       der to resume execution.

AUTHORS

       OpenFabrics.

Libfabric Programmer’s Manual                      2022-12-11                                          fi_msg(3)