Provided by: ncbi-entrez-direct_24.0.20250523+dfsg-1_amd64 bug

NAME

       xtract - NCBI Entrez Direct XML conversion and transformation tool

SYNOPSIS

       xtract  [-help]  [-strict]  [-mixed]  [-self]  [-accent]  [-ascii] [-compress] [-stops] [-input filename]
       [-transform filename]  [-aliases filename]  [-pattern expr]  [-group expr]  [-block expr]  [-subset expr]
       [-path path] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else]
       [-position pos]    [-equals str]    [-contains str]    [-mimics str]    [-excludes str]   [-includes str]
       [-is-within str]  [-starts-with str]  [-ends-with str]  [-is-not str]  [-is-before str]   [-is-after str]
       [-consists-of str]   [-matches str]  [-resembles str]  [-is-equal-to expr]  [-differs-from expr]  [-gt N]
       [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str]  [-sep str]  [-pfx str]  [-sfx str]  [-rst]
       [-clr] [-pfc str] [-deq str] [-def str] [-lbl str] [-set tag] [-rec tag] [-wrp tag] [-enc tag] [-plg str]
       [-elg str]  [-pkg tag]  [-fwd str]  [-awd str] [-tag tag] [-att key str] [-atr key element] [-cls] [-slf]
       [-end tag] [-bkt] [-element element] [-first element]  [-last element]  [-first element]  [-last element]
       [-backward element]   [-NAME]   [--STATS]  [-num element]  [-len element]  [-sum element]  [-acc element]
       [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element]  [-dev element]
       [-med element]  [-mul element] [-div element] [-mod element] [-geo element] [-hrm element] [-rms element]
       [-sqt element] [-lge element] [-lg2 element] [-log element] [-bin element] [-oct element]  [-hex element]
       [-bit element]   [-pad element]  [-encode element]  [-decode element]  [-upper element]  [-lower element]
       [-chain element] [-title element] [-mirror element]  [-alpha element]  [-alnum element]  [-basic element]
       [-plain element] [-simple element] [-author element] [-journal element] [-prose element] [-terms element]
       [-words element]    [-pairs element]   [-split element -with str]   [-order element]   [-reverse element]
       [-letters element]    [-clauses element]    [-pentamers element]     [-year element]     [-month element]
       [-date element]   [-page element]   [-auth element]  [-initials element]  [-trim element]  [-wct element]
       [-doi element] [-accession element] [-numeric element] [-translate element] [-classify element] [-replace
       -reg target    -exp replacement]     [-fasta]     [-revcomp]     [-nucleic]     [-ncbi2na]     [-ncbi4na]
       [-cds2prot [-gcode N] [-frame N]]    [-molwt]    [-molwt-m]    [-molwt-f]    [-pept]   [-0-based element]
       [-1-based element]  [-ucsc-based element]  [-insd arg ...]   [-insdx]   [-histogram]   [-indexer element]
       [-head str]  [-tail str]  [-hd str]  [-tl str]  [-select condition]  [-in filename] [-sort[-fwd] element]
       [-sort-rev element]   [-format fmt   [-unicode style]]   [-verify]   [-test]    [-outline]    [-synopsis]
       [-contour [delimiter]] [-examples] [-unix] [-version]

DESCRIPTION

       xtract converts an XML document into a table of data values according to user-specified rules.

OPTIONS

   Processing Flags
       -strict   Remove HTML and MathML tags.

       -mixed    Allow mixed content XML.

       -self     Allow detection of empty self-closing tags.

       -accent   Delete Unicode accents and diacritical marks.

       -ascii    Convert Unicode to numeric HTML character entities.

       -compress Compress runs of spaces.

       -stops    Retain stop words in selected phrases.

   Data Source
       -input filename     Read XML from file instead of standard input.

       -transform filename File of substitutions for -translate.

       -aliases filename   Mappings file for -classify operation.

   Exploration Argument Hierarchy
       -pattern expr
       -group expr
       -block expr
       -subset expr
              Name  of record within set.  Use of different argument names allows command-line control of nested
              looping.

   Path Navigation
       -path path     Explore by list of adjacent object names.

   Exploration Constructs
       Object         DateRevised
       Parent/Child   Book/AuthorList
       Path           MedlineCitation/Article/Journal/JournalIssue/PubDate
       Heterogeneous  "PubmedArticleSet/*"
       Exhaustive     "History/**"
       Nested         "*/Taxon"

   Conditional Execution
       -if expr [constraint]
              Element (or @attribute) must exist and satisfy any specified constraint.

       -unless expr [constraint]
              Skip if element matches.

       -and condition
              Preceding and following tests must both pass.

       -or condition
              Any passing test suffices.

       -else  Execute if conditional test failed.

       -position pos
              first/last/outer/inner/even/odd/all.

   String Constraints
       -equals str      String must match exactly.

       -contains str    Substring must be present.

       -mimics str      Containment test after converting punctuation to space.

       -excludes str    Substring must be absent.

       -includes str    Substring must match at word boundaries.

       -is-within str   String must be present.

       -starts-with str Substring must be at beginning.

       -ends-with str   Substring must be at end.

       -is-not str      String must not match.

       -is-before str   First string < second string.

       -is-after str    First string > second string.

       -consists-of str String must only contain specified characters.

       -matches str     Matches without commas or semicolons.

       -resembles str   Requires all words, but in any order.

   Object Constraints
       -is-equal-to expr   Object values must match.

       -differs-from expr  Object values must differ.

   Numeric Constraints
       -gt N  Greater than.

       -ge N  Greater than or equal to.

       -lt N  Less than to.

       -le N  Less than or equal to.

       -eq N  Equal to.

       -ne N  Not equal to.

   Format Customization
       -ret str  Override line break between patterns.

       -tab str  Replace tab character between fields.

       -sep str  Separator between group members.

       -pfx str  Prefix to print before group.

       -sfx str  Suffix to print after group.

       -rst      Reset -sep through -elg.

       -clr      Clear queued tab separator.

       -pfc str  Preface combines -clr and -pfx.

       -deq str  Delete and replace queued tab separator.

       -def str  Default placeholder for missing fields.

       -lbl str  Insert arbitrary text.

   XML Generation
       -set tag  XML tag for entire set.

       -rec tag  XML tag for each record.

       -wrp tag  Wrap elements in XML object.

       -enc tag  Encase instance in XML object.

       -plg str  Prologue to print before instance.

       -elg str  Epilogue to print after instance.

       -pkg tag  Package subset in XML object.

       -fwd str  Foreword to print before subset.

       -awd str  Afterword to print after subset.

   Tag and Attribute Construction
       -tag tag            Start with <tag.

       -att key str        Attribute key and literal string.

       -att key element    Attribute key and element name.

       -cls                Close with >.

       -slf                Self-close with />.

       -end tag            End contents with </tag>.

   FASTA Parsable Fields
       -bkt   Wrap elements in bracketed fields.

   Element Selection
       -element element    Print all items that match tag name.

       -first element      Only print value of first item.

       -last element       Only print value of last item.

       -even element       Only print value of even items.

       -odd element        Only print value of odd items.

       -backward element   Print values in reverse order.

       -NAME               Record value in named variable.

       --STATS             Accumulate values into variable.

   -element Constructs
       Tag            Caption
       Group          Initials,LastName
       Parent/Child   MedlineCitation/PMID
       Recursive      "**/Gene-commentary_accession"
       Unrestricted   PubDate/*
       Attribute      DescriptorName@MajorTopicYN
       Range          MedlineDate[1:4]
       Substring      "Title[phospholipase | rattlesnake]"
       Alternative    "[can contain ^ vertical bar]"
       Object Count   "#Author"
       Item Length    "%Title"
       Element Depth  "^PMID"
       Variable       "&NAME"

   Special -element Operations
       Parent Index   "+"
       Object Name    "?"
       Object Value   "~"
       XML Subtree    "*"
       Children       "$"
       Attributes     "@"
       ASN.1 Record   "."
       JSON Record    "%"

   Numeric (Integer) Processing
       -num element   Count.

       -len element   Length.

       -sum element   Sum.

       -acc element   Accumulator.

       -min element   Minimum.

       -max element   Maximum.

       -inc element   Increment.

       -dec element   Decrement.

       -sub element   Difference.

       -avg element   Arithmetic mean.

       -dev element   Deviation.

       -med element   Median.

       -mul element   Product.

       -div element   Quotient.

       -mod element   Remainder.

       -geo element   Geometric mean.

       -hrm element   Harmonic mean.

       -rms element   Root mean square.

       -sqt element   Square root.

       -lge element   Natural logarithm.

       -lg2 element   Logarithm base two.

       -log element   Logarithm base ten.

       -bin element   Binary.

       -oct element   Octal.

       -hex element   Hexadecimal.

       -bit element   Number of bits set.

   Leading Zero Padding
       -pad element   Zero-pad to eight digits.

   Character Processing
       -encode element     XML-encode <, >, &, ", and ' characters.

       -decode element     Base64-decode object embedded in XML.

       -upper element      Convert text to uppercase.

       -lower element      Convert text to lowercase.

       -chain element      Change spaces to underscores.

       -title element      Capitalize initial letters of words.

       -mirror element     Reverse order of letters.

       -alnum element      Non-alphabetic characters to space.

       -alnum element      Non-alphanumeric characters to space.

   String Processing
       -basic element      Convert superscripts and subscripts.

       -plain element      Remove embedded mixed-content markup tags.

       -simple element     Normalize accented letters; spell Greek letters.

       -author element     Multi-step author cleanup.

       -jour element       Journal capitalization and punctuation punctuation.

       -prose element      Text conversion to ASCII.

   Text Processing
       -terms element      Partition text at spaces.

       -words element      Split at punctuation marks.

       -pairs element      Adjacent informative words.

       -split element      Split using -with for delimiter.

       -order element      Rearrange words in sorted order.

       -reverse element    Reverse words in string.

       -letters element    Separate individual letters.

       -clauses element    Break at phrase separators.

       -pentamers element  Sliding window of pentamers.

   Citation Functions
       -year element       Extract first 4-digit year from string.

       -month element      Match first month name and return a corresponding integer.

       -date element       YYYY/MM/DD from -unit "PubDate" -date "*"

       -page element       Get digits (and letters) of first page number.

       -auth element       Change GenBank authors to Medline form.

       -initials element   Parse initials from forename or given name.

       -trim element       Remove extra spaces and leading zeros.

       -wct element        Count number of -words in a string.

       -doi element        Add https://doi.org/ prefix, URL encode.

       -accession element  Allow indexing of full accession.version.

       -numeric element    Only accept items that are entirely digits.

   Value Transformation
       -translate element  Substitute values with -transform table.

       -classify element   Substring word or phrase matches to -aliases table.

   Regular Expression
       -replace  Substitute text using regular expressions.
                 -reg target    Target expression.
                 -exp pattern   Replacement pattern.

   Sequence Processing
       -fasta Split sequence into blocks of 70 uppercase letters.

   Nucleotide Processing
       -revcomp  Reverse complement nucleotide sequence.

       -nucleic  Subrange determines forward or revcomp.

       -ncbi2na  Expand ncbi2na to IUPAC.  (May need to truncate result to actual sequence length.)

       -ncbi4na  Expand ncbi4na to IUPAC.  (May need to truncate result to actual sequence length.)

       -cds2prot [-gcode N] [-frame N]
                 Translate coding region using -gcode and (1-based) -frame (both 1 by default).

   Protein Processing
       -molwt    Calculate molecular weight of peptide.

       -molwt-m  Molecular weight retaining initial methionine.

       -molwt-f  Keep initial M residue as formyl-methionine.

       -pept     Split amino acid runs at *, -, x, or X.

   Sequence Coordinates
       -0-based element    Zero-based.

       -1-based element    One-based.

       -ucsc-based element Half-open.

   Command Generator
       -insd arg ...
              Generate INSDSeq extraction commands.  Print them if invoked standalone; run them  if  invoked  as
              part of a pipeline.  Requires one or more arguments, which may appear in the following order:

              Descriptor(s)  INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]

              Completeness   complete/partial

              Feature(s)     CDS/mRNA/...[,...]

              Qualifier(s)   INSDFeature_key/"#INSDInterval"/gene/product/feat_location/sub_sequence/... [...]

       -insdx Process -insd output table into XML.

   Frequency Table
       -histogram
              Collects data for sort-uniq-count(1) on entire set of records.

   Entrez Indexing
       -indexer element    Positional index using -wrp for field name.

   Output Organization
       -head str Print before everything else.

       -tail str Print after everything else.

       -hd str   Print before each record.

       -tl str   Print after each record.

   Record Selection
       -select condition   Select record subset by conditions.

       -in filename        File of identifiers to use for selection.

   Record Rearrangement
       -sort[-fwd] element Element to use as sort key.

       -sort-rev element   Sort records in reverse order.

   Reformatting
       -format fmt
              copy     Fast block copy (still applies processing flags).
              compact  Compress runs of spaces.
              flush    Suppress line indentation.
              indent   Indent according to nesting depth.
              expand   Place each attribute on a separate line.

   Validation
       -verify   Report XML data integrity problems.

       -test     Check field for visible combining accents and invisible Unicode.

   Summary
       -outline                 Display outline of XML structure.

       -synopsis                Display individual XML paths.

       -contour [delimiter]     Display XML paths to leaf nodes (delimited by / by default).

   Full Exploration Command Precedence
       -pattern
       -path
       -division
       -group
       -branch
       -block
       -section
       -subset
       -unit

   Documentation
       -help     Print usage information and some example argument combinations.

       -examples Complete usage examples, involving additional Entrez Direct tools.

       -unix     Illustrate common Unix command arguments.

       -version  Print version number.

NOTES

       String constraints use case-insensitive comparisons.

       Numeric constraints and selection arguments use integer values.

       -num and -len selections are synonyms for Object Count (#) and Item Length (%).

       -words, -pairs, and -indices convert to lower case.

SEE ALSO

       align-columns(1),     archive-nihocc(1),     archive-nlmnlp(1),     archive-nmcds(1),    archive-pids(1),
       archive-pmc(1), archive-pubmed(1), archive-taxonomy(1), asn2ref(1),  between-two-genes(1),  bsmp2info(1),
       csv2xml(1),  custom-index(1),  disambiguate-nucleotides(1),  download-flatfile(1), download-ncbi-data(1),
       ds2pme(1), efetch(1), esample(1), filter-columns(1), find-in-gene(1),  fuse-ranges(1),  fuse-segments(1),
       gbf2facds(1),  gbf2fsa(1),  gbf2info(1),  gbf2tbl(1), gene2range(1), gff2xml(1), gff-sort(1), gm2segs(1),
       hgvs2spdi(1), nquire(1), pm-collect(1), pm-refresh(1), pma2apa(1), pma2pme(1), pmc2bioc(1),  pmc2info(1),
       print-columns(1),  rchive(1),  refseq-nm-cds(1),  reorder-columns(1),  snp2hgvs(1),  snp2tbl(1), sort-ta‐
       ble(1), sort-uniq-count(1), spdi2tbl(1), tbl2prod(1), transmute(1), uniq-table(1), xfetch(1), xfilter(1),
       xinfo(1), xlink(1), xml2fsa(1), xml2tbl(1), xsearch(1), xy-plot(1).

NCBI                                               2025-05-26                                          XTRACT(1)