Ubuntu Manpage: nccopy - Copy a netCDF file, optionally changing format, compression, or chunking in the output.

Provided by: netcdf-bin_4.9.2-5ubuntu4_amd64

NAME

       nccopy - Copy a netCDF file, optionally changing format, compression, or chunking in the output.

SYNOPSIS


       nccopy  [-k  kind_name ] [-kind_code] [-d  n ] [-s] [-c  chunkspec ] [-u] [-w] [-[v|V] var1,...]  [-[g|G]
              grp1,...]  [-m  bufsize ] [-h  chunk_cache ] [-e  cache_elems ] [-r] [-F  filterspec ]  [-L   n  ]
              [-M  n ]  infile  outfile

DESCRIPTION

       The  nccopy utility copies an input netCDF file in any supported format variant to an output netCDF file,
       optionally converting the output to any compatible  netCDF  format  variant,  compressing  the  data,  or
       rechunking  the  data.   For  example,  if  built with the netCDF-3 library, a netCDF classic file may be
       copied to a netCDF 64-bit offset file, permitting larger variables.  If built with the netCDF-4  library,
       a  netCDF classic file may be copied to a netCDF-4 file or to a netCDF-4 classic model file as well, per‐
       mitting data compression, efficient schema changes, larger variable sizes, and use of other netCDF-4 fea‐
       tures.

       If no output format is specified, with either -k kind_name or -kind_code, then the output  will  use  the
       same format as the input, unless the input is classic or 64-bit offset and either chunking or compression
       is  specified,  in which case the output will be netCDF-4 classic model format.  Attempting some kinds of
       format conversion will result in an error, if the conversion is not possible.  For example, an attempt to
       copy a netCDF-4 file that uses features of the enhanced model, such as groups or variable-length strings,
       to any of the other kinds of netCDF formats that use the classic model will result in an error.

       nccopy also serves as an example of a generic netCDF-4 program, with its ability to read any valid netCDF
       file and handle nested groups, strings, and user-defined types,  including  arbitrarily  nested  compound
       types, variable-length types, and data of any valid netCDF-4 type.

       If  DAP  support was enabled when nccopy was built, the file name may specify a DAP URL. This may be used
       to convert data on DAP servers to local netCDF files.

OPTIONS

-k kind_name
Use format name to specify the kind of file to be created and, by inference, the data model (i.e.
netcdf-3 (classic) or netcdf-4 (enhanced)). The possible arguments are:

'nc3' or 'classic' => netCDF classic format

'nc6' or '64-bit offset' => netCDF 64-bit format

'nc4' or 'netCDF-4' => netCDF-4 format (enhanced data model)

'nc7' or 'netCDF-4 classic model' => netCDF-4 classic model format

Note: The old format numbers '1', '2', '3', '4', equivalent to the format names 'nc3', 'nc6',
'nc4', or 'nc7' respectively, are also still accepted but deprecated, due to easy confusion be‐
tween format numbers and format names.

[-kind_code]
Use format numeric code (instead of format name) to specify the kind of file to be created and, by
inference, the data model (i.e. netcdf-3 (classic) versus netcdf-4 (enhanced)). The numeric codes
are:

3 => netcdf classic format

6 => netCDF 64-bit format

4 => netCDF-4 format (enhanced data model)

7 => netCDF-4 classic model format
The numeric code "7" is used because "7=3+4", specifying the format that uses the netCDF-3 data model for
compatibility with the netCDF-4 storage format for performance. Credit is due to NCO for use of these nu‐
meric codes instead of the old and confusing format numbers.

-d n
For netCDF-4 output, including netCDF-4 classic model, specify deflation level (level of compres‐
sion) for variable data output. 0 corresponds to no compression and 9 to maximum compression,
with higher levels of compression requiring marginally more time to compress or uncompress than
lower levels. As a side effect specifying a compression level of 0 (via "-d 0") actually turns off
deflation altogether. Compression achieved may also depend on output chunking parameters. If
this option is specified for a classic format or 64-bit offset format input file, it is not neces‐
sary to also specify that the output should be netCDF-4 classic model, as that will be the de‐
fault. If this option is not specified and the input file has compressed variables, the compres‐
sion will still be preserved in the output, using the same chunking as in the input by default.

Note that nccopy requires all variables to be compressed using the same compression level, but the
API has no such restriction. With a program you can customize compression for each variable inde‐
pendently.

-s For netCDF-4 output, including netCDF-4 classic model, specify shuffling of variable data bytes
before compression or after decompression. Shuffling refers to interlacing of bytes in a chunk so
that the first bytes of all values are contiguous in storage, followed by all the second bytes,
and so on, which often improves compression. This option is ignored unless a non-zero deflation
level is specified. Using -d0 to specify no deflation on input data that has been compressed and
shuffled turns off both compression and shuffling in the output.

-u Convert any unlimited size dimensions in the input to fixed size dimensions in the output. This
can speed up variable-at-a-time access, but slow down record-at-a-time access to multiple vari‐
ables along an unlimited dimension.

-w Keep output in memory (as a diskless netCDF file) until output is closed, at which time output
file is written to disk. This can greatly speedup operations such as converting unlimited dimen‐
sion to fixed size (-u option), chunking, rechunking, or compressing the input. It requires that
available memory is large enough to hold the output file. This option may provide a larger
speedup than careful tuning of the -m, -h, or -e options, and it's certainly a lot simpler.

-c chunkspec
For netCDF-4 output, including netCDF-4 classic model, specify chunking (multidimensional tiling)
for variable data in the output. This is useful to specify the units of disk access, compression,
or other filters such as checksums. Changing the chunking in a netCDF file can also greatly
speedup access, by choosing chunk shapes that are appropriate for the most common access patterns.

The chunkspec argument has several forms. The first form is the original, deprecated form and is a
string of comma-separated associations, each specifying a dimension name, a '/' character, and op‐
tionally the corresponding chunk length for that dimension. No blanks should appear in the
chunkspec string, except possibly escaped blanks that are part of a dimension name. A chunkspec
names at least one dimension, and may omit dimensions which are not to be chunked or for which the
default chunk length is desired. If a dimension name is followed by a '/' character but no subse‐
quent chunk length, the actual dimension length is assumed. If copying a classic model file to a
netCDF-4 output file and not naming all dimensions in the chunkspec, unnamed dimensions will also
use the actual dimension length for the chunk length. An example of a chunkspec for variables
that use 'm' and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200 chunks. To see the
chunking resulting from copying with a chunkspec, use the '-s' option of ncdump on the output
file.

The chunkspec '/' that omits all dimension names and corresponding chunk lengths specifies that no
chunking is to occur in the output, so can be used to unchunk all the chunked variables. To see
the chunking resulting from copying with a chunkspec, use the '-s' option of ncdump on the output
file.

As an I/O optimization, nccopy has a threshold for the minimum size of non-record variables that
get chunked, currently 8192 bytes. The -M flag can be used to override this value.

Note that nccopy requires variables that share a dimension to also share the chunk size associated
with that dimension, but the programming interface has no such restriction. If you need to cus‐
tomize chunking for variables independently, you will need to use the second form of chunkspec.
This second form of chunkspec has this syntax: var:n1,n2,...,nn . This assumes that the variable
named "var" has rank n. The chunking to be applied to each dimension of the variable is specified
by the values of n1 through nn. This second form of chunking specification can be repeated multi‐
ple times to specify the exact chunking for different variables. If the variable is specified but
no chunk sizes are specified (i.e. -c var: ) then chunking is disabled for that variable. If the
same variable is specified more than once, the second and later specifications are ignored. Also,
this second form, per-variable chunking, takes precedence over any per-dimension chunking except
the bare "/" case.

The third form of the chunkspec has the syntax: var:compact or var:contiguous. This explicitly
attempts to set the variable storage type as compact or contiguous, respectively. These may be
overridden if other flags require the variable to be chunked.

-v var1,...
The output will include data values for the specified variables, in addition to the declarations
of all dimensions, variables, and attributes. One or more variables must be specified by name in
the comma-delimited list following this option. The list must be a single argument to the command,
hence cannot contain unescaped blanks or other white space characters. The named variables must be
valid netCDF variables in the input-file. A variable within a group in a netCDF-4 file may be
specified with an absolute path name, such as "/GroupA/GroupA2/var". Use of a relative path name
such as 'var' or "grp/var" specifies all matching variable names in the file. The default, with‐
out this option, is to include data values for all variables in the output.

-V var1,...
The output will include the specified variables only but all dimensions and global or group at‐
tributes. One or more variables must be specified by name in the comma-delimited list following
this option. The list must be a single argument to the command, hence cannot contain unescaped
blanks or other white space characters. The named variables must be valid netCDF variables in the
input-file. A variable within a group in a netCDF-4 file may be specified with an absolute path
name, such as '/GroupA/GroupA2/var'. Use of a relative path name such as 'var' or 'grp/var' spec‐
ifies all matching variable names in the file. The default, without this option, is to include
all variables in the output.

-g grp1,...
The output will include data values only for the specified groups. One or more groups must be
specified by name in the comma-delimited list following this option. The list must be a single ar‐
gument to the command. The named groups must be valid netCDF groups in the input-file. The de‐
fault, without this option, is to include data values for all groups in the output.

-G grp1,...
The output will include only the specified groups. One or more groups must be specified by name
in the comma-delimited list following this option. The list must be a single argument to the com‐
mand. The named groups must be valid netCDF groups in the input-file. The default, without this
option, is to include all groups in the output.

-m bufsize
An integer or floating-point number that specifies the size, in bytes, of the copy buffer used to
copy large variables. A suffix of K, M, G, or T multiplies the copy buffer size by one thousand,
million, billion, or trillion, respectively. The default is 5 Mbytes, but will be increased if
necessary to hold at least one chunk of netCDF-4 chunked variables in the input file. You may
want to specify a value larger than the default for copying large files over high latency net‐
works. Using the '-w' option may provide better performance, if the output fits in memory.

-h chunk_cache
For netCDF-4 output, including netCDF-4 classic model, an integer or floating-point number that
specifies the size in bytes of chunk cache allocated for each chunked variable. This is not a
property of the file, but merely a performance tuning parameter for avoiding compressing or decom‐
pressing the same data multiple times while copying and changing chunk shapes. A suffix of K, M,
G, or T multiplies the chunk cache size by one thousand, million, billion, or trillion, respec‐
tively. The default is 4.194304 Mbytes (or whatever was specified for the configure-time constant
CHUNK_CACHE_SIZE when the netCDF library was built). Ideally, the nccopy utility should accept
only one memory buffer size and divide it optimally between a copy buffer and chunk cache, but no
general algorithm for computing the optimum chunk cache size has been implemented yet. Using the
'-w' option may provide better performance, if the output fits in memory.

-e cache_elems
For netCDF-4 output, including netCDF-4 classic model, specifies number of chunks that the chunk
cache can hold. A suffix of K, M, G, or T multiplies the number of chunks that can be held in the
cache by one thousand, million, billion, or trillion, respectively. This is not a property of the
file, but merely a performance tuning parameter for avoiding compressing or decompressing the same
data multiple times while copying and changing chunk shapes. The default is 1009 (or whatever was
specified for the configure-time constant CHUNK_CACHE_NELEMS when the netCDF library was built).
Ideally, the nccopy utility should determine an optimum value for this parameter, but no general
algorithm for computing the optimum number of chunk cache elements has been implemented yet.

-r Read netCDF classic or 64-bit offset input file into a diskless netCDF file in memory before copy‐
ing. Requires that input file be small enough to fit into memory. For nccopy, this doesn't seem
to provide any significant speedup, so may not be a useful option.

-L n Set the log level; only usable if nccopy supports netCDF-4 (enhanced).

-M n Set the minimum chunk size; only usable if nccopy supports netCDF-4 (enhanced).

-F filterspec
For netCDF-4 output, including netCDF-4 classic model, specify a filter to apply to a specified
set of variables in the output. As a rule, the filter is a compression/decompression algorithm
with a unique numeric identifier assigned by the HDF Group (see https://support.hdfgroup.org/ser‐
vices/filters.html).

The filterspec argument has this general form.
fqn1|fqn2...,filterid,param1,param2...paramn or *,filterid,param1,param2...paramn
An fqn (fully qualified name) is the name of a variable prefixed by its containing groups with the group
names separated by forward slash ('/'). An example might be /g1/g2/var. Alternatively, just the variable
name can be given if it is in the root group: e.g. var. Backslash escapes may be used as needed. A note
of warning: the '|' separator is a bash reserved character, so you will probably need to put the filter
spec in some kind of quotes or otherwise escape it.

The filterid is an unsigned positive integer representing the id assigned by the HDFgroup to the
filter. Following the id is a sequence of parameters defining the operation of the filter. Each
parameter is a 32-bit unsigned integer.

This parameter may be repeated multiple times with different variable names.

EXAMPLES

       Make a copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF file of the same type:

              nccopy foo1.nc foo2.nc

       Note that the above copy will not be as fast as use of cp or other simple copy utility, because the  file
       is copied using only the netCDF API.  If the input file has extra bytes after the end of the netCDF data,
       those  will not be copied, because they are not accessible through the netCDF interface.  If the original
       file was generated in "No fill" mode so that fill values are not stored for padding for  data  alignment,
       the output file may have different padding bytes.

       Convert  a  netCDF-4  classic  model file, compressed.nc, that uses compression, to a netCDF-3 file clas‐
       sic.nc:

              nccopy -k classic compressed.nc classic.nc

       Note that 'nc3' could be used instead of 'classic'.

       Download the variable 'time_bnds' and its associated attributes from an OPeNDAP server and copy  the  re‐
       sult to a netCDF file named 'tb.nc':

              nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc

       Note  that  URLs  that  name  specific variables as command-line arguments should generally be quoted, to
       avoid the shell interpreting special characters such as '?'.

       Compress all the variables in the input file foo.nc, a netCDF file  of  any  type,  to  the  output  file
       bar.nc:

              nccopy -d1 foo.nc bar.nc

       If  foo.nc  was  a  classic  or 64-bit offset netCDF file, bar.nc will be a netCDF-4 classic model netCDF
       file, because the classic and 64-bit offset format variants don't support compression.  If foo.nc  was  a
       netCDF-4  file  with  some variables compressed using various deflation levels, the output will also be a
       netCDF-4 file of the same type, but all the variables, including any uncompressed variables in the input,
       will now use deflation level 1.

       Assume the input data includes gridded variables that use time, lat, lon dimensions, with 1000  times  by
       1000  latitudes  by  1000  longitudes,  and that the time dimension varies most slowly.  Also assume that
       users want quick access to data at all times for a small set of lat-lon points.  Accessing data for  1000
       times would typically require accessing 1000 disk blocks, which may be slow.

       Reorganizing  the data into chunks on disk that have all the time in each chunk for a few lat and lon co‐
       ordinates would greatly speed up such access.  To chunk the data in the input file slow.nc, a netCDF file
       of any type, to the output file fast.nc, you could use;

              nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc

       to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.  If you had enough memory to  con‐
       tain the output file, you could speed up the rechunking operation significantly by creating the output in
       memory before writing it to disk on close (using the -w flag):

              nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
       Alternatively, one could write this using the alternate, variable-specific chunking specification and as‐
       suming that times, lat, and lon are variables.

              nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc

Chunking Rules

The complete set of chunking rules is captured here. As a rough summary, these rules preserve all chunk‐
ing properties from the input file. These rules apply only when the selected output format supports
chunking, i.e. for the netcdf-4 variants.

The variable specific chunking specification should be obvious and translates directly to the correspond‐
ing "nc_def_var_chunking" API call.

The original per-dimension, chunking specification requires some interpretation by nccopy. The following
rules are applied in the given order independently for each variable to be copied from input to output.
The rules are written assuming we are trying to determine the chunking for a given output variable Vout
that comes from an input variable Vin.

1. If there is no '-c' option that applies to a variable and the corresponding input variable is con‐
tiguous or the input is some netcdf-3 variant, then let the netcdf-c library make all chunking de‐
cisions.

2. For each dimension of Vout explicitly specified on the command line (using the '-c' option), apply
the chunking value for that dimension regardless of input format or input properties.

3. For dimensions of Vout not named on the command line in a '-c' option, preserve chunk sizes from
the corresponding input variable, if it is chunked.

4. If Vin is contiguous, and none of its dimensions are named on the command line, and chunking is
not mandated by other options, then make Vout be contiguous.

5. If the input variable is contiguous (or is some netcdf-3 variant) and there are no options requir‐
ing chunking, or the '/' special case for the '-c' option is specified, then the output variable V
is marked as contiguous.