Provided by: samtools_1.19.2-1build2_amd64 bug

NAME

       samtools-cram-size - list a break down of data types in a CRAM file

SYNOPSIS

       samtools cram-size [-ve] [-o file] in.bam

DESCRIPTION

       Produces  a summary of CRAM block Content ID numbers and their associated Data Series stored within them.
       Optionally a more detailed breakdown of how each data series is encoded per container may also be  listed
       using the -e or --encodings option.

       CRAM  permits  mixing  multiple Data Series into a single block.  In this case it is not possible to tell
       the relative proportion that the Data Series consume within that  block.   CRAM  also  permits  different
       encodings  and  block Content ID assignment per container, although this would be highly unusual.  Htslib
       will always assign the same Data Series to a block with  a  consistent  Content  ID,  although  the  CRAM
       Encoding may change.

       Each CRAM block has a compression method.  These may not be consistent between successive blocks with the
       same  Content ID.  Htslib learns which compression methods work, so a single Content ID may have multiple
       compression methods associated with it.  The methods utilised are listed per line with a single character
       code, although the size breakdown per method and a more verbose description can be  shown  using  the  -v
       option.   The  compression  codecs  used  in  CRAM  may have a variety of parameters, such as compression
       levels, inbuilt transformations, and choices of entropy encoding.  An  attempt  is  made  to  distinguish
       between these different method parameterisations.

       The compression methods and their short and long (verbose) name are below:

                       Short   Long                 Description
                       ─────────────────────────────────────────────────────────────────────────
                       g       gzip                 Gzip
                       _       gzip-min             Gzip -1
                       G       gzip-max             Gzip -9
                       b       bzip2                Bzip2
                       b       bzip2-1 to bzip2-8   Explicit bzip2 compression levels
                       B       bzip2-9              Bzip2 -9
                       l       lzma                 LZMA
                       r       r4x8-o0              rANS 4x8 Order-0
                       R       r4x8-o1              rANS 4x8 Order-1
                       0       r4x16-o0             rANS 4x16 Order-0
                       0       r4x16-o0R            rANS 4x16 Order-0 with RLE
                       0       r4x16-o0P            rANS 4x16 Order-0 with PACK
                       0       r4x16-o0PR           rANS 4x16 Order-0 with PACK and RLE
                       1       r4x16-o1             rANS 4x16 Order-1
                       1       r4x16-o1R            rANS 4x16 Order-1 with RLE
                       1       r4x16-o1P            rANS 4x16 Order-1 with PACK
                       1       r4x16-o1PR           rANS 4x16 Order-1 with PACK and RLE
                       4       r32x16-o0            rANS 32x16 Order-0
                       4       r32x16-o0R           rANS 32x16 Order-0 with RLE
                       4       r32x16-o0P           rANS 32x16 Order-0 with PACK
                       4       r32x16-o0PR          rANS 32x16 Order-0 with PACK and RLE
                       5       r32x16-o1            rANS 32x16 Order-1
                       5       r32x16-o1R           rANS 32x16 Order-1 with RLE
                       5       r32x16-o1P           rANS 32x16 Order-1 with PACK
                       5       r32x16-o1PR          rANS 32x16 Order-1 with PACK and RLE
                       8       rNx16-xo0            rANS Nx16 STRIPED mode
                       2       rNx16-cat            rANS Nx16 CAT mode
                       a       arith-o0             Arithmetic coding Order-0
                       a       arith-o0R            Arithmetic coding Order-0 with RLE
                       a       arith-o0P            Arithmetic coding Order-0 with PACK
                       a       arith-o0PR           Arithmetic coding Order-0 with PACK and RLE
                       A       arith-o1             Arithmetic coding Order-1
                       A       arith-o1R            Arithmetic coding Order-1 with RLE
                       A       arith-o1P            Arithmetic coding Order-1 with PACK
                       A       arith-o1PR           Arithmetic coding Order-1 with PACK and RLE
                       a       arith-xo0            Arithmetic coding STRIPED mode
                       a       arith-cat            Arithmetic coding CAT mode
                       f       fqzcomp              FQZComp quality codec
                       n       tok3-rans            Name tokeniser with rANS encoding
                       n       tok3-arith           Name tokeniser with Arithmetic encoding

OPTIONS

       -o FILE   Output size information to FILE.

       -v        Verbose mode.  This shows one line per combination of Content ID and compression method.

       -e, --encodings
                 CRAM  uses  an Encoding, which describes how the data is serialised into a data block.  This is
                 distinct from the CRAM compression method, which is then applied to  the  block  post-encoding.
                 The encoding methods are stored per CRAM Container.

                 This  option  list  CRAM record encoding map and tag encoding map.  This shows the data series,
                 the associated CRAM encoding method, such as HUFFMAN, BETA  or  EXTERNAL,  and  any  parameters
                 associated  with  that  encoding.  The output may be large as this is information per container
                 rather than a single set of summary statistics at the end of processing.

EXAMPLES

       -      The basic summary of block Content ID sizes for a CRAM file:
                $ samtools cram-size in.cram
                #   Content_ID  Uncomp.size    Comp.size   Ratio Method  Data_series
                BLOCK     CORE            0            0 100.00% .
                BLOCK       11    394734019     51023626  12.93% g       RN
                BLOCK       12   1504781763     99158495   6.59% R       QS
                BLOCK       13       330065        84195  25.51% _r.g    IN
                BLOCK       14     26625602      6803930  25.55% Rrg     SC
                ...

       -      Show the same file above with verbose mode.  Here we see the distinct  compression  methods  which
              have been used per block Content ID.
                $ samtools cram-size -v in.cram
                #   Content_ID  Uncomp.size    Comp.size   Ratio Method      Data_series
                BLOCK     CORE            0            0 100.00% raw
                BLOCK       11    394734019     51023626  12.93% gzip        RN
                BLOCK       12   1504781763     99158495   6.59% r4x8-o1     QS
                BLOCK       13       275033        64343  23.39% gzip-min    IN
                BLOCK       13        43327        15412  35.57% r4x8-o0     IN
                BLOCK       13         2452         2452 100.00% raw         IN
                BLOCK       13         9253         1988  21.49% gzip        IN
                BLOCK       14     23106404      5903351  25.55% r4x8-o1     SC
                BLOCK       14      1951616       513722  26.32% r4x8-o0     SC
                BLOCK       14      1567582       386857  24.68% gzip        SC
                ...

       -      List  encoding  methods  per  CRAM  Data Series.  The two letter series are the standard CRAM Data
              Series and the three letter ones are the optional auxiliary  tags  with  the  tag  name  and  type
              combined.

                $ samtools cram-size -e in.cram
                Container encodings
                    RN      BYTE_ARRAY_STOP(stop=0,id=11)
                    QS      EXTERNAL(id=12)
                    IN      BYTE_ARRAY_STOP(stop=0,id=13)
                    SC      BYTE_ARRAY_STOP(stop=0,id=14)
                    BB      BYTE_ARRAY_LEN(len_codec={EXTERNAL(id=42)}, \
                                           val_codec={EXTERNAL(id=37)}
                    ...
                    XAZ     BYTE_ARRAY_STOP(stop=9,id=5783898)
                    MDZ     BYTE_ARRAY_STOP(stop=9,id=5063770)
                    ASC     BYTE_ARRAY_LEN(len_codec={HUFFMAN(codes={1},lengths={0})}, \
                                           val_codec={EXTERNAL(id=4281155)}
                    ...

AUTHOR

       Written by James Bonfield from the Sanger Institute.

SEE ALSO

       samtools(1),

       Samtools website: <http://www.htslib.org/>

samtools-1.19.2                                  24 January 2024                           samtools-cram-size(1)