Ubuntu Manpage: xtract - NCBI Entrez Direct XML conversion and transformation tool

Provided by: ncbi-entrez-direct_24.0.20250523+dfsg-1_amd64

NAME

       xtract - NCBI Entrez Direct XML conversion and transformation tool

SYNOPSIS

       xtract  [-help]  [-strict]  [-mixed]  [-self]  [-accent]  [-ascii] [-compress] [-stops] [-input filename]
       [-transform filename]  [-aliases filename]  [-pattern expr]  [-group expr]  [-block expr]  [-subset expr]
       [-path path] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else]
       [-position pos]    [-equals str]    [-contains str]    [-mimics str]    [-excludes str]   [-includes str]
       [-is-within str]  [-starts-with str]  [-ends-with str]  [-is-not str]  [-is-before str]   [-is-after str]
       [-consists-of str]   [-matches str]  [-resembles str]  [-is-equal-to expr]  [-differs-from expr]  [-gt N]
       [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str]  [-sep str]  [-pfx str]  [-sfx str]  [-rst]
       [-clr] [-pfc str] [-deq str] [-def str] [-lbl str] [-set tag] [-rec tag] [-wrp tag] [-enc tag] [-plg str]
       [-elg str]  [-pkg tag]  [-fwd str]  [-awd str] [-tag tag] [-att key str] [-atr key element] [-cls] [-slf]
       [-end tag] [-bkt] [-element element] [-first element]  [-last element]  [-first element]  [-last element]
       [-backward element]   [-NAME]   [--STATS]  [-num element]  [-len element]  [-sum element]  [-acc element]
       [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element]  [-dev element]
       [-med element]  [-mul element] [-div element] [-mod element] [-geo element] [-hrm element] [-rms element]
       [-sqt element] [-lge element] [-lg2 element] [-log element] [-bin element] [-oct element]  [-hex element]
       [-bit element]   [-pad element]  [-encode element]  [-decode element]  [-upper element]  [-lower element]
       [-chain element] [-title element] [-mirror element]  [-alpha element]  [-alnum element]  [-basic element]
       [-plain element] [-simple element] [-author element] [-journal element] [-prose element] [-terms element]
       [-words element]    [-pairs element]   [-split element -with str]   [-order element]   [-reverse element]
       [-letters element]    [-clauses element]    [-pentamers element]     [-year element]     [-month element]
       [-date element]   [-page element]   [-auth element]  [-initials element]  [-trim element]  [-wct element]
       [-doi element] [-accession element] [-numeric element] [-translate element] [-classify element] [-replace
       -reg target    -exp replacement]     [-fasta]     [-revcomp]     [-nucleic]     [-ncbi2na]     [-ncbi4na]
       [-cds2prot [-gcode N] [-frame N]]    [-molwt]    [-molwt-m]    [-molwt-f]    [-pept]   [-0-based element]
       [-1-based element]  [-ucsc-based element]  [-insd arg ...]   [-insdx]   [-histogram]   [-indexer element]
       [-head str]  [-tail str]  [-hd str]  [-tl str]  [-select condition]  [-in filename] [-sort[-fwd] element]
       [-sort-rev element]   [-format fmt   [-unicode style]]   [-verify]   [-test]    [-outline]    [-synopsis]
       [-contour [delimiter]] [-examples] [-unix] [-version]

DESCRIPTION

       xtract converts an XML document into a table of data values according to user-specified rules.

OPTIONS

Processing Flags
-strict Remove HTML and MathML tags.

-mixed Allow mixed content XML.

-self Allow detection of empty self-closing tags.

-accent Delete Unicode accents and diacritical marks.

-ascii Convert Unicode to numeric HTML character entities.

-compress Compress runs of spaces.

-stops Retain stop words in selected phrases.

Data Source
-input filename Read XML from file instead of standard input.

-transform filename File of substitutions for -translate.

-aliases filename Mappings file for -classify operation.

Exploration Argument Hierarchy
-pattern expr
-group expr
-block expr
-subset expr
Name of record within set. Use of different argument names allows command-line control of nested
looping.

Path Navigation
-path path Explore by list of adjacent object names.

Exploration Constructs
Object DateRevised
Parent/Child Book/AuthorList
Path MedlineCitation/Article/Journal/JournalIssue/PubDate
Heterogeneous "PubmedArticleSet/*"
Exhaustive "History/**"
Nested "*/Taxon"

Conditional Execution
-if expr [constraint]
Element (or @attribute) must exist and satisfy any specified constraint.

-unless expr [constraint]
Skip if element matches.

-and condition
Preceding and following tests must both pass.

-or condition
Any passing test suffices.

-else Execute if conditional test failed.

-position pos
first/last/outer/inner/even/odd/all.

String Constraints
-equals str String must match exactly.

-contains str Substring must be present.

-mimics str Containment test after converting punctuation to space.

-excludes str Substring must be absent.

-includes str Substring must match at word boundaries.

-is-within str String must be present.

-starts-with str Substring must be at beginning.

-ends-with str Substring must be at end.

-is-not str String must not match.

-is-before str First string < second string.

-is-after str First string > second string.

-consists-of str String must only contain specified characters.

-matches str Matches without commas or semicolons.

-resembles str Requires all words, but in any order.

Object Constraints
-is-equal-to expr Object values must match.

-differs-from expr Object values must differ.

Numeric Constraints
-gt N Greater than.

-ge N Greater than or equal to.

-lt N Less than to.

-le N Less than or equal to.

-eq N Equal to.

-ne N Not equal to.

Format Customization
-ret str Override line break between patterns.

-tab str Replace tab character between fields.

-sep str Separator between group members.

-pfx str Prefix to print before group.

-sfx str Suffix to print after group.

-rst Reset -sep through -elg.

-clr Clear queued tab separator.

-pfc str Preface combines -clr and -pfx.

-deq str Delete and replace queued tab separator.

-def str Default placeholder for missing fields.

-lbl str Insert arbitrary text.

XML Generation
-set tag XML tag for entire set.

-rec tag XML tag for each record.

-wrp tag Wrap elements in XML object.

-enc tag Encase instance in XML object.

-plg str Prologue to print before instance.

-elg str Epilogue to print after instance.

-pkg tag Package subset in XML object.

-fwd str Foreword to print before subset.

-awd str Afterword to print after subset.

Tag and Attribute Construction
-tag tag Start with <tag.

-att key str Attribute key and literal string.

-att key element Attribute key and element name.

-cls Close with >.

-slf Self-close with />.

-end tag End contents with </tag>.

FASTA Parsable Fields
-bkt Wrap elements in bracketed fields.

Element Selection
-element element Print all items that match tag name.

-first element Only print value of first item.

-last element Only print value of last item.

-even element Only print value of even items.

-odd element Only print value of odd items.

-backward element Print values in reverse order.

-NAME Record value in named variable.

--STATS Accumulate values into variable.

-element Constructs
Tag Caption
Group Initials,LastName
Parent/Child MedlineCitation/PMID
Recursive "**/Gene-commentary_accession"
Unrestricted PubDate/*
Attribute DescriptorName@MajorTopicYN
Range MedlineDate[1:4]
Substring "Title[phospholipase | rattlesnake]"
Alternative "[can contain ^ vertical bar]"
Object Count "#Author"
Item Length "%Title"
Element Depth "^PMID"
Variable "&NAME"

Special -element Operations
Parent Index "+"
Object Name "?"
Object Value "~"
XML Subtree "*"
Children "$"
Attributes "@"
ASN.1 Record "."
JSON Record "%"

Numeric (Integer) Processing
-num element Count.

-len element Length.

-sum element Sum.

-acc element Accumulator.

-min element Minimum.

-max element Maximum.

-inc element Increment.

-dec element Decrement.

-sub element Difference.

-avg element Arithmetic mean.

-dev element Deviation.

-med element Median.

-mul element Product.

-div element Quotient.

-mod element Remainder.

-geo element Geometric mean.

-hrm element Harmonic mean.

-rms element Root mean square.

-sqt element Square root.

-lge element Natural logarithm.

-lg2 element Logarithm base two.

-log element Logarithm base ten.

-bin element Binary.

-oct element Octal.

-hex element Hexadecimal.

-bit element Number of bits set.

Leading Zero Padding
-pad element Zero-pad to eight digits.

Character Processing
-encode element XML-encode <, >, &, ", and ' characters.

-decode element Base64-decode object embedded in XML.

-upper element Convert text to uppercase.

-lower element Convert text to lowercase.

-chain element Change spaces to underscores.

-title element Capitalize initial letters of words.

-mirror element Reverse order of letters.

-alnum element Non-alphabetic characters to space.

-alnum element Non-alphanumeric characters to space.

String Processing
-basic element Convert superscripts and subscripts.

-plain element Remove embedded mixed-content markup tags.

-simple element Normalize accented letters; spell Greek letters.

-author element Multi-step author cleanup.

-jour element Journal capitalization and punctuation punctuation.

-prose element Text conversion to ASCII.

Text Processing
-terms element Partition text at spaces.

-words element Split at punctuation marks.

-pairs element Adjacent informative words.

-split element Split using -with for delimiter.

-order element Rearrange words in sorted order.

-reverse element Reverse words in string.

-letters element Separate individual letters.

-clauses element Break at phrase separators.

-pentamers element Sliding window of pentamers.

Citation Functions
-year element Extract first 4-digit year from string.

-month element Match first month name and return a corresponding integer.

-date element YYYY/MM/DD from -unit "PubDate" -date "*"

-page element Get digits (and letters) of first page number.

-auth element Change GenBank authors to Medline form.

-initials element Parse initials from forename or given name.

-trim element Remove extra spaces and leading zeros.

-wct element Count number of -words in a string.

-doi element Add https://doi.org/ prefix, URL encode.

-accession element Allow indexing of full accession.version.

-numeric element Only accept items that are entirely digits.

Value Transformation
-translate element Substitute values with -transform table.

-classify element Substring word or phrase matches to -aliases table.

Regular Expression
-replace Substitute text using regular expressions.
-reg target Target expression.
-exp pattern Replacement pattern.

Sequence Processing
-fasta Split sequence into blocks of 70 uppercase letters.

Nucleotide Processing
-revcomp Reverse complement nucleotide sequence.

-nucleic Subrange determines forward or revcomp.

-ncbi2na Expand ncbi2na to IUPAC. (May need to truncate result to actual sequence length.)

-ncbi4na Expand ncbi4na to IUPAC. (May need to truncate result to actual sequence length.)

-cds2prot [-gcode N] [-frame N]
Translate coding region using -gcode and (1-based) -frame (both 1 by default).

Protein Processing
-molwt Calculate molecular weight of peptide.

-molwt-m Molecular weight retaining initial methionine.

-molwt-f Keep initial M residue as formyl-methionine.

-pept Split amino acid runs at *, -, x, or X.

Sequence Coordinates
-0-based element Zero-based.

-1-based element One-based.

-ucsc-based element Half-open.

Command Generator
-insd arg ...
Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as
part of a pipeline. Requires one or more arguments, which may appear in the following order:

Descriptor(s) INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]

Completeness complete/partial

Feature(s) CDS/mRNA/...[,...]

Qualifier(s) INSDFeature_key/"#INSDInterval"/gene/product/feat_location/sub_sequence/... [...]

-insdx Process -insd output table into XML.

Frequency Table
-histogram
Collects data for sort-uniq-count(1) on entire set of records.

Entrez Indexing
-indexer element Positional index using -wrp for field name.

Output Organization
-head str Print before everything else.

-tail str Print after everything else.

-hd str Print before each record.

-tl str Print after each record.

Record Selection
-select condition Select record subset by conditions.

-in filename File of identifiers to use for selection.

Record Rearrangement
-sort[-fwd] element Element to use as sort key.

-sort-rev element Sort records in reverse order.

Reformatting
-format fmt
copy Fast block copy (still applies processing flags).
compact Compress runs of spaces.
flush Suppress line indentation.
indent Indent according to nesting depth.
expand Place each attribute on a separate line.

Validation
-verify Report XML data integrity problems.

-test Check field for visible combining accents and invisible Unicode.

Summary
-outline Display outline of XML structure.

-synopsis Display individual XML paths.

-contour [delimiter] Display XML paths to leaf nodes (delimited by / by default).

Full Exploration Command Precedence
-pattern
-path
-division
-group
-branch
-block
-section
-subset
-unit

Documentation
-help Print usage information and some example argument combinations.

-examples Complete usage examples, involving additional Entrez Direct tools.

-unix Illustrate common Unix command arguments.

-version Print version number.

NOTES

       String constraints use case-insensitive comparisons.

       Numeric constraints and selection arguments use integer values.

       -num and -len selections are synonyms for Object Count (#) and Item Length (%).

       -words, -pairs, and -indices convert to lower case.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

NOTES

SEE ALSO