Provided by: libglobus-ftp-control-doc_9.10-3build1_all bug

NAME

       globus_ftp_extensions - GridFTP: Protocol Extensions to FTP for the Grid.

Introduction

       This section defines extensions to the FTP specification STD 9, RFC 959, FILE TRANSFER PROTOCOL (FTP)
       (October 1985) These extensions provide striped data transfer, parallel data transfer, extended data
       transfer, data buffer size configuration, and data channel authentication.

       The following new commands are introduced in this specification

       • Striped Passive (SPAS)Striped Data Port (SPOR)Extended Retrieve (ERET)Extended Store (ESTO)Set Data Buffer Size (SBUF)Data Channel Authentication Mode (DCAU)

       A  new transfer mode (extended-block mode) is introduced for parallel and striped data transfers. Also, a
       set of extension options to RETR are added to control striped data layout and parallelism.

       The following new feature names are to be included in the FTP server's response to FEAT if it  implements
       the following sets of functionality

       PARALLEL
           The server supports the SPOR, SPAS, the RETR options mentioned above, and extended block mode.

       ESTO
           The server implements the ESTO command as described in this document.

       ERET
           The server implements the ERET command as described in this document.

       SBUF
           The server implements the SBUF command as described in this document.

       DCAU
           The  server implements the DCAU command as described in this document, including the requirement that
           data channels are authenticated by default, if RFC 2228  authentication  is  used  to  establish  the
           control channel.

Terminology

       Parallel transfer
           From a single data server, splitting file data for transfer over multiple data connections.

       Striped transfer
           Distributing  a  file's data over multiple independent data nodes, and transerring over multiple data
           connections.

       Data Node
           In a striped data transfer, a data node is one of  the  stripe  destinations  returned  in  the  SPAS
           command, or one of the stripe destinations sent in the SPOR command.

       DTP
           The  data  transfer  process  establishes  and manages the data connection. The DTP can be passive or
           active.

       PI
           The protocol interpreter. The user and server sides of the protocol have distinct  roles  implemented
           in a user-PI and a server-PI.

FTP Standards Used

       • RFC 959, FILE TRANSFER PROTOCOL (FTP), J. Postel, R. Reynolds (October 1985)

         • Commands used by GridFTP

           • USER

           • PASS

           • ACCT

           • CWD

           • CDUP

           • QUIT

           • REIN

           • PORT

           • PASV

           • TYPE

           • MODE

           • RETR

           • STOR

           • STOU

           • APPE

           • ALLO

           • REST

           • RNFR

           • RNTO

           • ABOR

           • DELE

           • RMD

           • MKD

           • PWD

           • LIST

           • NLST

           • SITE

           • SYST

           • STAT

           • HELP

           • NOOP

         • Features used by GridFTP

           • ASCII and Image types

           • Stream mode

           • File structure

       • RFC 2228, FTP Security Extensions, Horowitz, M. and S. Lunt (October 1997)

         • Commands used by GridFTP

           • AUTH

           • ADAT

           • MIC

           • CONF

           • ENC

         • Features used by GridFTP

           • GSSAPI authentication

       • RFC  2389,  Feature  negotiation  mechanism for the File Transfer Protocol, P. Hethmon , R. Elz (August
         1998)

         • Commands used by GridFTP

           • FEAT

           • OPTS

         • Features used by GridFTP

       • FTP Extensions, R. Elz, P. Hethmon (September 2000)

         • Commands used by GridFTP

           • SIZE

         • Features used by GridFTP

           • Restart of a stream mode transfer

Striped Passive (SPAS)

       This extension is used to establish a vector of data socket listeners for for a server with one  or  more
       stripes.  This  command  MUST  be  used in conjunction with the extended block mode. The response to this
       command includes a list of host and port addresses the server is listening on.

       Due to the nature of the extended block mode protocol,  SPAS  must  be  used  in  conjunction  with  data
       transfer  commands  which  receive  data  (such as STOR, ESTO, or APPE) and can not be used with commands
       which send data on the data channels.

       Syntax

       The syntax of the SPAS command is:

       spas = "SPAS" <CRLF>

       Responses

       The server-PI will respond to the SPAS command with a 229 reply giving the list of host-port strings  for
       the remote server-DTP or user-DTP to connect to.

       spas-response = "229-Entering Striped Passive Mode" CRLF
                        1*(<SP> host-port CRLF)
                        229 End

       Where the command is correctly parsed, but the server-DTP cannot process the SPAS request, it must return
       the same error responses as the PASV command.

       OPTS for SPAS

       There are no options in this SPAS specification, and hence there is no OPTS command defined.

Striped Data Port (SPOR)

       This  extension  is  to  be  used  as  a  complement to the SPAS command to implement striped third-party
       transfers. This command MUST always be used in conjunction with the extended block mode. The argument  to
       SPOR is a vector of host/TCP listener port pairs to which the server is to connect. This

       Due  to  the  nature  of  the  extended  block  mode protocol, SPOR must be used in conjunction with data
       transfer commands which send data (such as RETR, ERET, LIST, or NLST) and can not be used  with  commands
       which receive data on the data channels.

       Syntax

       The syntax of the SPOR command is:

       SPOR 1*(<SP> <host-port>) <CRLF>

       The host-port sequence in the command structure MUST match the host-port replies to a SPAS command.

       Responses

       The  server-PI  will respond to the SPOR command with the same response set as the PORT command described
       in the ftp specification.

       OPTS for SPOR

       There are no options in this SPOR specification, and hence there is no OPTS command defined.

Extended Retrieve (ERET)

       The extended retrieve extension is used  to  request  that  a  retrieve  be  done  with  some  additional
       processing on the server. This command an extensible way of providing server-side data reduction or other
       modifications  to  the  RETR  command. This command is used in place of OPTS to the RETR command to allow
       server side processing to be done with a single round trip (one command sent to  the  server  instead  of
       two) for latency-critical applications.

       ERET  may  be  used  with  either the data transports defined in RFC 959, or using extended block mode as
       defined in this document. Using an ERET creates a new virtual file which will be sent, with it's own size
       and byte range starting at zero. Restart markers generated while processing an ERET are relative  to  the
       beginning of this view of the file.

       Syntax

       The syntax of the ERET command is

       ERET <SP> <retrieve-mode> <SP> <filename>

       retrieve-mode ::= P <SP> <offset> <SP> <size>
       offset ::= 64 bit integer
       size ::= 64 bit integer

       The  retrieve-mode  defines  behavior  of  the  extended-retrieve mode. There is one mode defined by this
       specification, but other general purpose or application-specific ones may be added later.

       modes_ERET Extended Retrieve Modes

       Partial Retrieve Mode (P)
           A section of the file will be retrieved from the data server. The section is defined by the  starting
           offset  and  extent  size  parameters. When used with extended block mode, the extended block headers
           sent along with data will send the data with offset of 0 meaning the beginning of the section of  the
           file which was requested.

Extended Store (ESTO)

       The  extended  store extension is used to request that a store be done with some additional processing on
       the server. Arbitrary data processing algorithms may be added by defining  additional  ESTO  store-modes.
       Similar to the ERET, the ESTO command expects data sent to satisfy the request to be sent as if it were a
       new file with data block offset 0 being beginning the beginning of the new file.

       The format of the ESTO command is

       ESTO <SP> <store-mode> <filename>

       store-mode ::= A <SP> <offset>

       The  store-mode  defines  the  behavior  of  the  extended  store.  There  is  one  mode  defined by this
       specification, but others may be added later.

       Extended Store Modes

       Adjusted store (A)
           The data in the file is to stored with offset added to the file pointer before storing the blocks  of
           the  file.  In extended block mode, this value is added to the offset in the extended block header by
           the server when writing to disk. Extended block headers should therefore send the  beginning  of  the
           byte  range  on  the  data  channel  with  offset of zero. In stream mode, the offset is added to the
           implicit offset of 0 for the beginning of the data before writing. If a stream mode restart marker is
           used in conjunction with this ESTO mode, the restart marker's offset is added to the offset passed as
           the parameter to the adjusted store.

Set Buffer Size (SBUF)

       This extension adds the capability of a client to set the TCP buffer size for subsequent data connections
       to a value. This replaces the server-specific commands SITE RBUFSIZE, SITE RETRBUFSIZE, SITE RBUFSZ, SITE
       SBUFSIZE, SITE SBUFSZ, and SITE BUFSIZE. Clients may wish to consider supporting these other commands  to
       ensure wider compatibility.

       Syntax

       The syntax of the SBUF command is

       sbuf = SBUF <SP> <buffer-size>

       buffer-size ::= <number>

       The  buffer-size  value is the TCP buffer size in bytes. The TCP window size should be set accordingly by
       the server.

       Response Codes

       If the server-PI is able to set the buffer size state to the requested buffer-size, then it will return a
       200 reply.

       Note
           Even if the SBUF is accepted by the server, an error may occur later when the  data  connections  are
           actually created, depending on how the server or client operating systems' TCP implementations.

Data Channel Authentication (DCAU)

       This  extension  provides  a method for specifying the type of authentication to be performed on FTP data
       channels. This extension may only be used when the control connection was authenticated  using  RFC  2228
       Security extensions.

       The format of the DCAU command is

       DCAU <SP> <authentication-mode> <CRLF>

       authentication-mode ::= <no-authentication>
                             | <authenticate-with-self>
                             | <authenticate-with-subject>

       no-authentication ::= N
       authenticate-with-self ::= A
       authenticate-with-subject ::= S <subject-name>

       subject-name ::= string

       Authentication Modes

           • No authentication (N)
              No authentication handshake will be done upon data connection establishment.

           • Self authentication (A)
              A  security-protocol specific authentication will be used on the data channel. The identity of the
             remote data connection will be the same as the identity of the  user  which  authenticated  to  the
             control connection.

           • Subject-name authentication (S)
              A  security-protocol specific authentication will be used on the data channel. The identity of the
             remote data connection MUST match the supplied subject-name string.

       The default data channel authentication mode is A for FTP sessions which are RFC 2228 authenticated---the
       client must explicitly send a DCAU N message to  disable  it  if  it  does  not  implement  data  channel
       authentication.

       If  the  security  handshake  fails,  the  server  should  return  the  error  response 432 (Data channel
       authentication failed).

Extended Block Mode

       The striped and parallel data transfer methods described above  require  an  extended  transfer  mode  to
       support  out-of-sequence  data  delivery, and partial data transmission per data connection. The extended
       block mode described here extends the block mode header to provide support for these  as  well  as  large
       blocks, and end-of-data synchronization.

       Clients indicate that they want to use extended block mode by sending the command

       MODE <SP> E <CRLF>

       on the control channel before a transfer command is sent.

       The structure of the extended block header is

       Extended Block Header

       +----------------+-------/-----------+------/------------+
       | Descriptor     |    Byte Count     |    Offset Count   |
       |         8 bits |        64 bits    |          64 bits  |
       +----------------+-------/-----------+------/------------+

       The  descriptor  codes  are  indicated by bit flags in the descriptor byte. Six codes have been assigned,
       where each code number is the decimal value of the corresponding bit in the byte.

       Code     Meaning

        128     End of data block is EOR (Legacy)
         64     End of data block is EOF
         32     Suspected errors in data block
         16     Data block is a restart marker
          8     End of data block is EOD for a parallel/striped transfer
          4     Sender will close the data connection

       With this encoding, more than one descriptor coded condition may exist for a particular  block.  As  many
       bits as necessary may be flagged.

       Some  additional  protocol  is added to the extended block mode data channels, to properly handle end-of-
       file detection in the presence of an unknown number of data streams.

       • When no more data is to be sent on the data channel, then the sender will mark the last block, or  send
         a zero-length block after the last block with the EOD bit (8) set in the extended block header.

       • After  receiving an EOD the data connection can be cached for use in a subsequent transfer. To signifiy
         that the data connection will be closed the sender sets the close bit (4) in the  header  on  the  last
         message sent.

       • The  sender  communicates  end of file by sending an EOF message to all servers receiving data. The EOF
         message format follows.

       Extended Block EOF Header

       +----------------+-------/--------+------/---------------+
       | Descriptor     |     unused     |  EOD count expected  |
       |         8 bits |     64 bits    |        64 bits       |
       +----------------+-------/--------+------/---------------+

       EOF Descriptor. The EOF header descriptor has the same definition as  the  regular  data  message  header
       described above.

       EOD  Count  Expected.  This  64  bit  field  represents the total number of data connections that will be
       established with the server receiving the file. This number is used by the receiver to determine  it  has
       received  all  of the data. When the number of EOD messages received equals the number represented by the
       'EOD Count Expected' field the receiver has hit end of file.

       Simply waiting for EOD on all open data connections is not sufficient. It is possible that  the  receiver
       reads  an  EOD message on all of its open data connects while an additional data connection is in flight.
       If the receiver were to assume it reached end of file it would fail to receive the data on the in  flight
       connection.

       To  handle  EOF  in the multi-striped server case a 126 response has been introduced. When receiving data
       from a striped server a client makes a control connection to a single host, but several host  may  create
       several data connections back to the client. Each host can independently decide how many data connections
       it  will  use,  but  only  a  single  EOF message may be sent to back to the client, therefore it must be
       possible to aggregate the total number of data connections used in the transfer across the  stripes.  The
       126 response serves this purpose.

       The 126 is an intermediate response to RETR command. It has the following format.

       126 <SP> 1*(count of data connections)

       Several  'Count of data connections' can be in a single reply. They correspond to the stripes returned in
       the response to the SPAS command.

       Discussion of protocol change to enable bidirectional data channels brought up the following  problem  if
       doing bidirectional data channels

       If  the  client  is  pasv, and sending to a multi-stripe server, then the server creates data connections
       connections; since the client didn't do SPAS, it cannot associate HOST/PORT pairs on the data connections
       with stripes on the server (it doesn't even know how many there are). it cannot reliably determine  which
       nodes  to  send  data to. (Becomes even more complex in the third-party transfer case, because the sender
       may have multiple stripes of data.) The basic problem is that we need to know logical stripe  numbers  to
       know where to send the data.

       EOF Handling in Extended Block Mode

       If  you are in either striped or parallel mode, you will get exactly one EOF on each SPAS-specified ports
       (stripes). Hosts in extended block mode must be prepared to accept an arbitrary number of connections  on
       each SPOR port before the EOF block is sent.

       Restarting

       In general, opaque restart markers passed via the block header should not be used in extended block mode.
       Instead,  the  destination server should send extended data marker responses over the control connection,
       in the following form:

       extended-mark-response = "111" <SP> "Range Marker" <SP> <byte-ranges-list>

       byte-ranges-list       = <byte-range> [ *("," <byte-range>) ]
       byte-range             = <start-offset> "-" <end-offset>

       start-offset         ::= <number>
       end-offset           ::= <number>

       The byte ranges in the marker are an incremental set of byte ranges which have been stored to disk by the
       data server. The complete restart marker is a concatenation of all byte ranges received by the client  in
       111 responses.

       The  client  MAY  combine adjacent ranges received over several range responses into any number of ranges
       when sending the REST command to the server to restart a transfer.

       For example, the client, on receiving the responses:

       111 Range Marker 0-29
       111 Range Marker 30-89

       may send, equivalently,

       REST 0-29,30-89
       REST 0-89
       REST 30-59,0-29,60-89

       to restart the transfer after those 90 bytes have been received.

       The server MAY indicate that a given range of  data  has  been  received  in  multiple  subsequent  range
       markers. The client MUST be able to handle this. For example:

       111 Range Marker 30-59
       111 Range Marker 0-89

       is equivalent to

       111 Range Marker 30-59
       111 Range Marker 0-29,60-89

       Similarly,  the  client,  if  it  is  doing  no  processing  of  the  restart markers, MAY send redundant
       information in a restart.

       Should these be allowed as restart markers for stream mode?

       Performance Monitoring

       In order to monitor the performance of extended block mode transfer, an additional preliminary reply  MAY
       be transmitted over the control channel. This reply is of the form:

       extended-perf-response  = "112-Perf Marker" CRLF
                                 <SP> "Timestamp:" <SP> <timestamp> CRLF
                     <SP> "Stripe Index:" <SP> <stripe-number> CRLF
                     <SP> "Stripe Bytes Transferred:" <SP> <byte count> CRLF
                     <SP> "Total Stripe Count:" <SP> <stripe count> CRLF
                                 "112 End" CRLF

       timestamp               = <number> [ "." <digit> ]

       <timestamp> is seconds since the epoch

       The  performance  marker  can contain these or any other perf-line facts which provide useful information
       about the current performance.

       All perf-line facts represent an instantaneous state of the transfer at the given timestamp. The  meaning
       of the facts are

       • Timestamp - The time at which the server computed the performance information. This is in seconds since
         the epoch (00:00:00 UTC, January 1, 1970).

       • Stripe  Index  -  the  index  (0-number  of stripes on the STOR side of the transfer) which this marker
         pertains to.

       • Stripe Bytes Transferred - The number of bytes which have been received on this stripe.

       A transfer start time can be specified by a perf marker with 'Stripe Bytes Transferred' set to zero. Only
       the first marker per stripe can be used to specify the start time of that stripe. Any subsequent  markers
       with 'Stripe Bytes Transferred' set to zero simply indicates no data transfer over the interval.

       A  server should send a 'start' marker for each stripe. A server should also send a final perf marker for
       each stripe. This is a marker with 'Stripe Bytes Transferred' set to the total  transfer  size  for  that
       stripe.

Options to RETR

       The  options  described  in  this  section  provide  a  means to convey striping and transfer parallelism
       information to the server-DTP. For the RETR  command,  the  Client-FTP  may  specify  a  parallelism  and
       striping  mode  it  wishes  the  server-DTP  to use. These options are only used by the server-DTP if the
       retrieve operation is done in extended block mode. These options are implemented as RFC 2389 extensions.

       The format of the RETR OPTS is specified by:

       retr-opts     = "OPTS" <SP> "RETR" [<SP> option-list] CRLF
       option-list   = [ layout-opts ";" ] [ parallel-opts ";" ]
       layout-opts   = "StripeLayout=Partitioned"
                     | "StripeLayout=Blocked;BlockSize=" <block-size>
       parallel-opts = "Parallelism=" <starting-parallelism> ","
                                      <minimum-parallelism>  ","
                                      <maximum-parallelism>

       block-size           ::= <number>
       starting-parallelism ::= <number>
       minimum-parallelism  ::= <number>
       maximum-parallelism  ::= <number>

       Layout Options

       The layout option is used by the source data node to send sections of the data file  to  the  appropriate
       destination stripe. The various StripeLayout parameters are to be implemented as follows:

       Partitioned
           A  partitioned data layout is one where the data is distributed evenly on the destination data nodes.
           Only one contiguous section of data is stored on each data node. A data node is defined here a single
           host-port mentioned in the SPOR command

       Blocked
           A blocked data layout is  one  where  the  data  is  distributed  in  round-robin  fashion  over  the
           destination data nodes. The data distribution is ordered by the order of the host-port specifications
           in the SPOR command. The block-size defines the size of blocks to be distributed.

       PLVL Parallelism Options

       The  parallelism option is used by the source data node to control how many parallel data connections may
       be established to each destination data node. This extension option provides for both a  fixed  level  of
       parallelism,  and  for  adapting  the  parallelism to the host/network connection, within a range. If the
       starting-parallelism option is set, then the server-DTP will  make  starting-parallelism  connections  to
       each  destination  data  node.  If  the minimum-parallelism option is set, then the server may reduce the
       number of parallel connections per destination data node to this value. If the maximum-parallelism option
       is set, then the server may increase the number of parallel connections to per destination data  node  to
       at most this value.

References

       [1] Postel, J. and Reynolds, J., '<a href='ftp://ftp.isi.edu/in-notes/rfc959.txt'> FILE TRANSFER PROTOCOL
       (FTP)</a>', STD 9, RFC 959, October 1985.

       [2]  Hethmon,  P.  and  Elz,  R.,  '<a href='ftp://ftp.isi.edu/in-notes/rfc2389.txt'> Feature negotiation
       mechanism for the File Transfer Protocol</a>', RFC 2389, August 1998.

       [3]  Horowitz,  M.  and  Lunt,  S.,  '<a  href='ftp://ftp.isi.edu/in-notes/rfc2228.txt'>   FTP   Security
       Extensions</a>', RFC 2228, October 1997.

       [4] Elz, R. and Hethom, P., '<a href='http://www.ietf.org/internet-drafts/draft-ietf-ftpext-mlst-13.txt'>
       FTP Extensions</a>', IETF Draft, May 2001.

Appendix I: Implementation under GSI

       There  are several security components in this document which are extensions to the behavior of RFC 2228.
       These appendix  attempts  to  clarify  the  protocol  how  these  extensions  map  to  the  OpenSSL-based
       implementation of the GSSAPI known as GSI (Grid Security Infrastructure).

       A  client  implementation  which  communicates  with  a  server  which supports the DCAU extension should
       delegate a limited credential set (using the GSS_C_DELEG_FLAG  and  GSS_C_GLOBUS_LIMITED_DELEG_PROXY_FLAG
       flags  to  gss_init_sec_context()).  If delegation is not performed, the client MUST request that DCAU be
       disable by requesting DCAU N, or the server will be unable to perform the default of DCAU A as  described
       by this document.

       When  DCAU  mode 'A' or 'S' is used, a separate security context is established on each data channel. The
       context  is  established   by   performing   the   GSSAPI   handshake   with   the   active-DTP   calling
       gss_init_sec_context()  and  the passive-DTP calling gss_accept_sec_context(). No delegation need be done
       on these data channels.

       Data channel protection via the PROT command MUST always be used in conjunction with the DCAU A or DCAU S
       commands. If a PROT level is set, then messages will be wrapped according to RFC 2228  Appendix  I  using
       the contexts established on each data channel. Tokens transferred over the data channels when either PROT
       or  DCAU  is  used  are  not framed in any way when using GSI. (When implementing this specification with
       other GSSAPI mechanisms, a 4 byte, big endian, binary token length should proceed all tokens).

       If the DCAU mode or the PROT mode is changed  between  file  transfers  when  caching  data  channels  in
       extended  block  mode, all open data channels must be closed. This is because the GSI implementation does
       not support changing levels of protection on an existing connection.

globus_ftp_control                                Version 9.10                          globus_ftp_extensions(3)