Ubuntu Manpage: unikmer - Toolkit for nucleic acid k-mer analysis

NAME

       unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION

       unikmer - Toolkit for k-mer with taxonomic information

       unikmer  is  a  toolkit  for  nucleic acid k-mer analysis, providing functions including set operation on
       k-mers optional with TaxIds but without count information.

       K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64', and serialized  in  binary  file
       with extension '.unik'.

       TaxIds  can  be  assigned when counting k-mers from genome sequences, and LCA (Lowest Common Ancestor) is
       computed during set opertions  including  computing  union,  intersection,  set  difference,  unique  and
       repeated k-mers.

       Version: v0.19.0

       Author: Wei Shen <shenwei356@gmail.com>

       Documents  : https://bioinf.shenwei.me/unikmer Source code: https://github.com/shenwei356/unikmer

       Dataset (optional):

              Manipulating  k-mers  with  TaxIds  needs  taxonomy file from e.g., NCBI Taxonomy database, please
              extract "nodes.dmp", "names.dmp", "delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/
              , ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz , or some other directory, and later you  can
              refer to using flag --data-dir or environment variable UNIKMER_DB.

              For  GTDB,  use  'taxonkit  create-taxdump'  to create NCBI-style taxonomy dump files, or download
              from:

              https://github.com/shenwei356/gtdb-taxonomy

              Note that TaxIds are represented using uint32 and stored in 4 or less bytes, all TaxIds should  be
              in the range of [1, 4294967295]

   Usage:
              unikmer [command]

   Available Commands:
              autocompletion Generate shell autocompletion script (bash|zsh|fish|powershell) common         Find
              k-mers  shared  by  most of multiple binary files concat         Concatenate multiple binary files
              without removing duplicates count          Generate k-mers (sketch) from FASTA/Q sequences  decode
              Decode  encoded  integer to k-mer text diff           Set difference of multiple binary files dump
              Convert plain k-mer text to binary format encode         Encode plain k-mer text to integer filter
              Filter out low-complexity k-mers (experimental) grep           Search  k-mers  from  binary  files
              head            Extract  the  first  N  k-mers  info            Information  of binary files inter
              Intersection of multiple binary files locate         Locate k-mers in genome merge           Merge
              k-mers  from  sorted  chunk  files num            Quickly inspect number of k-mers in binary files
              rfilter        Filter k-mers by taxonomic rank sample         Sample k-mers from binary files sort
              Sort k-mers in binary files to reduce file size split           Split  k-mers  into  sorted  chunk
              files tsplit         Split k-mers according to taxid union          Union of multiple binary files
              uniqs           Mapping  k-mers  back  to genome and find unique subsequences version        Print
              version information and check for update view           Read and output  binary  format  to  plain
              text

   Flags:
       -c, --compact
              write compact binary file with little loss of speed

       --compression-level int
              compression level (default -1)

       --data-dir string
              directory   containing  NCBI  Taxonomy  files,  including  nodes.dmp,  names.dmp,  merged.dmp  and
              delnodes.dmp (default "/home/nilesh/.unikmer")

       -h, --help
              help for unikmer

       -I, --ignore-taxid
              ignore taxonomy information

       -i, --infile-list string
              file of input files list (one file per line), if given,  they  are  appended  to  files  from  cli
              arguments

       --max-taxid uint32
              for smaller TaxIds, we can use less space to store TaxIds. default value is 1<<32-1, that's enough
              for NCBI Taxonomy TaxIds (default 4294967295)

       -C, --no-compress
              do not compress binary file (not recommended)

       --nocheck-file
              do not check binary file, when using process substitution or named pipe

       -j, --threads int
              number of CPUs to use (default 4)

       --verbose
              print verbose information

       Use "unikmer [command] --help" for more information about a command.

unikmer 0.19.0                                     August 2022                                        UNIKMER(1)