Provided by: freebsd-manpages_12.2-1_all bug

NAME

       pNFSserver — NFS Version 4.1 Parallel NFS Protocol Server

DESCRIPTION

       A  set of FreeBSD servers may be configured to provide a pnfs(4) service.  One FreeBSD system needs to be
       configured as a MetaData Server (MDS) and at least one additional FreeBSD system needs to  be  configured
       as one or more Data Servers (DS)s.

       These  FreeBSD  systems  are  configured to be NFSv4.1 servers, see nfsd(8) and exports(5) if you are not
       familiar with configuring a NFSv4.1 server.

DS server configuration

       The DS(s) need to be configured as NFSv4.1 server(s), with  a  top  level  exported  directory  used  for
       storage  of  data files.  This directory must be owned by “root” and would normally have a mode of “700”.
       Within this directory there needs to be additional directories  named  ds0,...,dsN  (where  N  is  19  by
       default)  also  owned  by  “root”  with  mode  “700”.  These are the directories where the data files are
       stored.  The following command can be run by root when in the top  level  exported  directory  to  create
       these subdirectories.

             jot -w ds 20 0 | xargs mkdir -m 700

       Note that “20” is the default and can be set to a larger value on the MDS as shown below.

       The  top  level  exported  directory  used for storage of data files must be exported to the MDS with the
       “maproot=root sec=sys” export options so that the MDS can create entries  in  these  subdirectories.   It
       must  also  be  exported  to  all pNFS aware clients, but these clients do not require the “maproot=root”
       export option and this directory should be exported to them with the same options as used by the  MDS  to
       export file system(s) to the clients.

       It  is  possible  to  have  multiple  DSs  on  the same FreeBSD system, but each of these DSs must have a
       separate top level exported directory used for storage of data files  and  each  of  these  DSs  must  be
       mountable  via  a  separate IP address.  Alias addresses can be set on the DS server system for a network
       interface via ifconfig(8) to create these different IP addresses.  Multiple DSs on the same server may be
       useful when data for different file systems on the MDS are being stored on different file system  volumes
       on the FreeBSD DS system.

MDS server configuration

       The  MDS  must  be  a  separate  FreeBSD  system  from  the  FreeBSD DS system(s) and NFS clients.  It is
       configured as a NFSv4.1 server with file system(s) exported to clients.  However, the “-p”  command  line
       argument for nfsd is used to indicate that it is running as the MDS for a pNFS server.

       The DS(s) must all be mounted on the MDS using the following mount options:

             nfsv4,minorversion=1,soft,retrans=2

       so  that  they  can  be defined as DSs in the “-p” option.  Normally these mounts would be entered in the
       fstab(5) on the MDS.  For example, if there are four DSs named nfsv4-data[0-3], the fstab(5) lines  might
       look like:

       nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
       nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
       nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0
       nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0

       The  nfsd(8)  command line option “-p” indicates that the NFS server is a pNFS MDS and specifies what DSs
       are to be used.
       For the above fstab(5) example, the nfsd(8) nfs_server_flags line in your rc.conf(5) might look like:

       nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3"

       This example specifies that the data files should be distributed over the four DSs and File layouts  will
       be  issued  to  pNFS enabled clients.  If issuing Flexible File layouts is desired for this case, setting
       the sysctl “vfs.nfsd.default_flexfile” non-zero in your sysctl.conf(5) file will make the  pNFSserver  do
       that.
       Alternately,  this  variant  of “nfs_server_flags” will specify that two way mirroring is to be done, via
       the “-m” command line option.

       nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2"

       With two way mirroring, the data file for each exported file on the MDS will be stored on two of the DSs.
       When mirroring is enabled, the server will always issue Flexible File layouts.

       It is also possible to specify which DSs are to be used to store data files for  specific  exported  file
       systems  on  the MDS.  For example, if the MDS has exported two file systems “/export1” and “/export2” to
       clients, the following variant of “nfs_server_flags” will specify that data files for “/export1” will  be
       stored on nfsv4-data0 and nfsv4-data1, whereas the data files for “/export2” will be store on nfsv4-data2
       and nfsv4-data3.

       nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2"

       This  can be used by system administrators to control where data files are stored and might be useful for
       control of storage use.  For this case, it may be convenient to co-locate more than one of the DSs on the
       same FreeBSD server, using separate file systems on the DS system for storage of the respective DS's data
       files.  If mirroring is desired for this case, the “-m” option also needs to be specified.  There must be
       enough DSs assigned to each exported file system on the MDS to support the level of mirroring.  The above
       example would be fine for two way mirroring, but four way mirroring would not work, since there are  only
       two DSs assigned to each exported file system on the MDS.

       The  number  of  subdirectories in each DS is defined by the “vfs.nfs.dsdirsize” sysctl on the MDS.  This
       value can be increased from the default of 20, but only when the nfsd(8) is not  running  and  after  the
       additional  ds20,...  subdirectories  have  been created on all the DSs.  For a service that will store a
       large number of files this sysctl should be set much  larger,  to  avoid  the  number  of  entries  in  a
       subdirectory from getting too large.

Client mounts

       Once operational, NFSv4.1 FreeBSD client mounts done with the “pnfs” option should do I/O directly on the
       DSs.  The clients mounting the MDS must be running the nfscbd daemon for pNFS to work.  Set

             nfscbd_enable="YES"

       in  the  rc.conf(5) on these clients.  Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the
       MDS, which acts as a proxy for the appropriate DS(s).

Backing up a pNFS service

       Since the data is separated from the metadata, the simple way to back up a pNFS service is to do so  from
       an  NFS client that has the service mounted on it.  If you back up the MDS exported file system(s) on the
       MDS, you must do it in such a way that the “system” namespace extended attributes get backed up.

Handling of failed mirrored DSs

       When a mirrored DS fails, it can be disabled one of three ways:

       1 - The MDS detects a problem when trying to do proxy operations on the DS.  This can take  a  couple  of
       minutes after the DS failure or network partitioning occurs.

       2  -  A  pNFS  client  can  report  an I/O error that occurred for a DS to the MDS in the arguments for a
       LayoutReturn operation.

       3 - The system administrator can perform the pnfsdskill(8) command on the  MDS  to  disable  it.  If  the
       system  administrator  does a pnfsdskill(8) and it fails with ENXIO (Device not configured) that normally
       means the DS was already disabled via #1 or #2. Since doing this is harmless, once a system administrator
       knows that there is a problem with a mirrored DS, doing the command is recommended.

       Once a system administrator knows that a mirrored DS has malfunctioned or has been  network  partitioned,
       they should do the following as root/su on the MDS:

             # pnfsdskill <mounted-on-path-of-DS>
             # umount -N <mounted-on-path-of-DS>

       Note  that  the  <mounted-on-path-of-DS>  must  be  the exact mounted-on path string used when the DS was
       mounted on the MDS.

       Once the mirrored DS has been disabled, the pNFS service should continue to function,  but  file  updates
       will  only  happen on the DS(s) that have not been disabled. Assuming two way mirroring, that implies the
       one DS of the pair stored in the “pnfsd.dsfile” extended attribute for the file on  the  MDS,  for  files
       stored on the disabled DS.

       The next step is to clear the IP address in the “pnfsd.dsfile” extended attribute on all files on the MDS
       for  the  failed DS.  This is done so that, when the disabled DS is repaired and brought back online, the
       data files on this DS will not be used, since they may be out of date.  The command that  clears  the  IP
       address is pnfsdsfile(8) with the “-r” option.

       For example:
       # pnfsdsfile -r nfsv4-data3 yyy.c
       yyy.c:  nfsv4-data2.home.rick   ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000    0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000

       replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 will not get used.

       Normally  this  will  be  called within a find(1) command for all regular files in the exported directory
       tree and must be done on the MDS.  When used with find(1), you will probably also want the “-q” option so
       that it won't spit out the results for every file.  If  the  disabled/repaired  DS  is  nfsv4-data3,  the
       commands done on the MDS would be:

       # cd <top-level-exported-dir>
       # find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} ;

       There  is a problem with the above command if the file found by find(1) is renamed or unlinked before the
       pnfsdsfile(8) command is done on it.  This should normally generate an error message.  A simple unlink is
       harmless but a link/unlink or rename might result in the file not having been  processed  under  its  new
       name.   To  check  that  all  files  have  their  IP  addresses set to 0.0.0.0 these commands can be used
       (assuming the sh(1) shell):

       # cd <top-level-exported-dir>
       # find . -type f -exec pnfsdsfile {} ; | sed "/nfsv4-data3/!d"

       Any line(s) printed require the pnfsdsfile(8) with “-r” to  be  done  again.   Once  this  is  done,  the
       replaced/repaired  DS can be brought back online.  It should have empty ds0,...,dsN directories under the
       top level exported directory for storage of data files just like it did when first set up.  Mount  it  on
       the MDS exactly as you did before disabling it.  For the nfsv4-data3 example, the command would be:

       # mount -t nfs -o nfsv4,minorversion=1,soft,retrans=2 nfsv4-data3:/ /data3

       Then restart the nfsd to re-enable the DS.

       # /etc/rc.d/nfsd restart

       Now, new files can be stored on nfsv4-data3, but files with the IP address zeroed out on the MDS will not
       yet  use the repaired DS (nfsv4-data3).  The next step is to go through the exported file tree on the MDS
       and, for each of the files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file  data
       to  the  repaired DS and re-enable use of this mirror for it.  This command for copying the file data for
       one MDS file is pnfsdscopymr(8) and it will also normally be used in a find(1).  For  the  example  case,
       the commands on the MDS would be:

       # cd <top-level-exported-dir>
       # find . -type f -exec pnfsdscopymr -r /data3 {} ;

       When  this  completes,  the  recovery  should  be  complete  or at least nearly so.  As noted above, if a
       link/unlink or rename occurs on a file name while the above find(1)  is  in  progress,  it  may  not  get
       copied.  To check for any file(s) not yet copied, the commands are:

       # cd <top-level-exported-dir>
       # find . -type f -exec pnfsdsfile {} ; | sed "/0.0.0.0/!d"

       If  this  command  prints out any file name(s), these files must have the pnfsdscopymr(8) command done on
       them to complete the recovery.

       # pnfsdscopymr -r /data3 <file-path-reported>

       If this commmand fails with the error
       “pnfsdscopymr: Copymr failed for file <path>: Device not configured”
       repeatedly, this may be caused by a Read/Write layout that has not been returned.  The only  way  to  get
       rid of such a layout is to restart the nfsd(8).

       All of these commands are designed to be done while the pNFS service is running and can be re-run safely.

       For a more detailed discussion of the setup and management of a pNFS service see:

             http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt

SEE ALSO

       nfsv4(4),  pnfs(4),  exports(5),  fstab(5),  rc.conf(5), sysctl.conf(5), nfscbd(8), nfsd(8), nfsuserd(8),
       pnfsdscopymr(8), pnfsdsfile(8), pnfsdskill(8)

HISTORY

       The pNFSserver command first appeared in FreeBSD 12.0.

BUGS

       Since the MDS cannot be mirrored, it is a single point of failure just as a non pNFS server is.  For non-
       mirrored configurations, all FreeBSD systems used in the service are single points of failure.

Debian                                           August 8, 2018                                    PNFSSERVER(4)