Provided by: ncbi-entrez-direct_24.0.20250523+dfsg-1_amd64 

NAME
xtract - NCBI Entrez Direct XML conversion and transformation tool
SYNOPSIS
xtract [-help] [-strict] [-mixed] [-self] [-accent] [-ascii] [-compress] [-stops] [-input filename]
[-transform filename] [-aliases filename] [-pattern expr] [-group expr] [-block expr] [-subset expr]
[-path path] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else]
[-position pos] [-equals str] [-contains str] [-mimics str] [-excludes str] [-includes str]
[-is-within str] [-starts-with str] [-ends-with str] [-is-not str] [-is-before str] [-is-after str]
[-consists-of str] [-matches str] [-resembles str] [-is-equal-to expr] [-differs-from expr] [-gt N]
[-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-rst]
[-clr] [-pfc str] [-deq str] [-def str] [-lbl str] [-set tag] [-rec tag] [-wrp tag] [-enc tag] [-plg str]
[-elg str] [-pkg tag] [-fwd str] [-awd str] [-tag tag] [-att key str] [-atr key element] [-cls] [-slf]
[-end tag] [-bkt] [-element element] [-first element] [-last element] [-first element] [-last element]
[-backward element] [-NAME] [--STATS] [-num element] [-len element] [-sum element] [-acc element]
[-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element] [-dev element]
[-med element] [-mul element] [-div element] [-mod element] [-geo element] [-hrm element] [-rms element]
[-sqt element] [-lge element] [-lg2 element] [-log element] [-bin element] [-oct element] [-hex element]
[-bit element] [-pad element] [-encode element] [-decode element] [-upper element] [-lower element]
[-chain element] [-title element] [-mirror element] [-alpha element] [-alnum element] [-basic element]
[-plain element] [-simple element] [-author element] [-journal element] [-prose element] [-terms element]
[-words element] [-pairs element] [-split element -with str] [-order element] [-reverse element]
[-letters element] [-clauses element] [-pentamers element] [-year element] [-month element]
[-date element] [-page element] [-auth element] [-initials element] [-trim element] [-wct element]
[-doi element] [-accession element] [-numeric element] [-translate element] [-classify element] [-replace
-reg target -exp replacement] [-fasta] [-revcomp] [-nucleic] [-ncbi2na] [-ncbi4na]
[-cds2prot [-gcode N] [-frame N]] [-molwt] [-molwt-m] [-molwt-f] [-pept] [-0-based element]
[-1-based element] [-ucsc-based element] [-insd arg ...] [-insdx] [-histogram] [-indexer element]
[-head str] [-tail str] [-hd str] [-tl str] [-select condition] [-in filename] [-sort[-fwd] element]
[-sort-rev element] [-format fmt [-unicode style]] [-verify] [-test] [-outline] [-synopsis]
[-contour [delimiter]] [-examples] [-unix] [-version]
DESCRIPTION
xtract converts an XML document into a table of data values according to user-specified rules.
OPTIONS
Processing Flags
-strict Remove HTML and MathML tags.
-mixed Allow mixed content XML.
-self Allow detection of empty self-closing tags.
-accent Delete Unicode accents and diacritical marks.
-ascii Convert Unicode to numeric HTML character entities.
-compress Compress runs of spaces.
-stops Retain stop words in selected phrases.
Data Source
-input filename Read XML from file instead of standard input.
-transform filename File of substitutions for -translate.
-aliases filename Mappings file for -classify operation.
Exploration Argument Hierarchy
-pattern expr
-group expr
-block expr
-subset expr
Name of record within set. Use of different argument names allows command-line control of nested
looping.
Path Navigation
-path path Explore by list of adjacent object names.
Exploration Constructs
Object DateRevised
Parent/Child Book/AuthorList
Path MedlineCitation/Article/Journal/JournalIssue/PubDate
Heterogeneous "PubmedArticleSet/*"
Exhaustive "History/**"
Nested "*/Taxon"
Conditional Execution
-if expr [constraint]
Element (or @attribute) must exist and satisfy any specified constraint.
-unless expr [constraint]
Skip if element matches.
-and condition
Preceding and following tests must both pass.
-or condition
Any passing test suffices.
-else Execute if conditional test failed.
-position pos
first/last/outer/inner/even/odd/all.
String Constraints
-equals str String must match exactly.
-contains str Substring must be present.
-mimics str Containment test after converting punctuation to space.
-excludes str Substring must be absent.
-includes str Substring must match at word boundaries.
-is-within str String must be present.
-starts-with str Substring must be at beginning.
-ends-with str Substring must be at end.
-is-not str String must not match.
-is-before str First string < second string.
-is-after str First string > second string.
-consists-of str String must only contain specified characters.
-matches str Matches without commas or semicolons.
-resembles str Requires all words, but in any order.
Object Constraints
-is-equal-to expr Object values must match.
-differs-from expr Object values must differ.
Numeric Constraints
-gt N Greater than.
-ge N Greater than or equal to.
-lt N Less than to.
-le N Less than or equal to.
-eq N Equal to.
-ne N Not equal to.
Format Customization
-ret str Override line break between patterns.
-tab str Replace tab character between fields.
-sep str Separator between group members.
-pfx str Prefix to print before group.
-sfx str Suffix to print after group.
-rst Reset -sep through -elg.
-clr Clear queued tab separator.
-pfc str Preface combines -clr and -pfx.
-deq str Delete and replace queued tab separator.
-def str Default placeholder for missing fields.
-lbl str Insert arbitrary text.
XML Generation
-set tag XML tag for entire set.
-rec tag XML tag for each record.
-wrp tag Wrap elements in XML object.
-enc tag Encase instance in XML object.
-plg str Prologue to print before instance.
-elg str Epilogue to print after instance.
-pkg tag Package subset in XML object.
-fwd str Foreword to print before subset.
-awd str Afterword to print after subset.
Tag and Attribute Construction
-tag tag Start with <tag.
-att key str Attribute key and literal string.
-att key element Attribute key and element name.
-cls Close with >.
-slf Self-close with />.
-end tag End contents with </tag>.
FASTA Parsable Fields
-bkt Wrap elements in bracketed fields.
Element Selection
-element element Print all items that match tag name.
-first element Only print value of first item.
-last element Only print value of last item.
-even element Only print value of even items.
-odd element Only print value of odd items.
-backward element Print values in reverse order.
-NAME Record value in named variable.
--STATS Accumulate values into variable.
-element Constructs
Tag Caption
Group Initials,LastName
Parent/Child MedlineCitation/PMID
Recursive "**/Gene-commentary_accession"
Unrestricted PubDate/*
Attribute DescriptorName@MajorTopicYN
Range MedlineDate[1:4]
Substring "Title[phospholipase | rattlesnake]"
Alternative "[can contain ^ vertical bar]"
Object Count "#Author"
Item Length "%Title"
Element Depth "^PMID"
Variable "&NAME"
Special -element Operations
Parent Index "+"
Object Name "?"
Object Value "~"
XML Subtree "*"
Children "$"
Attributes "@"
ASN.1 Record "."
JSON Record "%"
Numeric (Integer) Processing
-num element Count.
-len element Length.
-sum element Sum.
-acc element Accumulator.
-min element Minimum.
-max element Maximum.
-inc element Increment.
-dec element Decrement.
-sub element Difference.
-avg element Arithmetic mean.
-dev element Deviation.
-med element Median.
-mul element Product.
-div element Quotient.
-mod element Remainder.
-geo element Geometric mean.
-hrm element Harmonic mean.
-rms element Root mean square.
-sqt element Square root.
-lge element Natural logarithm.
-lg2 element Logarithm base two.
-log element Logarithm base ten.
-bin element Binary.
-oct element Octal.
-hex element Hexadecimal.
-bit element Number of bits set.
Leading Zero Padding
-pad element Zero-pad to eight digits.
Character Processing
-encode element XML-encode <, >, &, ", and ' characters.
-decode element Base64-decode object embedded in XML.
-upper element Convert text to uppercase.
-lower element Convert text to lowercase.
-chain element Change spaces to underscores.
-title element Capitalize initial letters of words.
-mirror element Reverse order of letters.
-alnum element Non-alphabetic characters to space.
-alnum element Non-alphanumeric characters to space.
String Processing
-basic element Convert superscripts and subscripts.
-plain element Remove embedded mixed-content markup tags.
-simple element Normalize accented letters; spell Greek letters.
-author element Multi-step author cleanup.
-jour element Journal capitalization and punctuation punctuation.
-prose element Text conversion to ASCII.
Text Processing
-terms element Partition text at spaces.
-words element Split at punctuation marks.
-pairs element Adjacent informative words.
-split element Split using -with for delimiter.
-order element Rearrange words in sorted order.
-reverse element Reverse words in string.
-letters element Separate individual letters.
-clauses element Break at phrase separators.
-pentamers element Sliding window of pentamers.
Citation Functions
-year element Extract first 4-digit year from string.
-month element Match first month name and return a corresponding integer.
-date element YYYY/MM/DD from -unit "PubDate" -date "*"
-page element Get digits (and letters) of first page number.
-auth element Change GenBank authors to Medline form.
-initials element Parse initials from forename or given name.
-trim element Remove extra spaces and leading zeros.
-wct element Count number of -words in a string.
-doi element Add https://doi.org/ prefix, URL encode.
-accession element Allow indexing of full accession.version.
-numeric element Only accept items that are entirely digits.
Value Transformation
-translate element Substitute values with -transform table.
-classify element Substring word or phrase matches to -aliases table.
Regular Expression
-replace Substitute text using regular expressions.
-reg target Target expression.
-exp pattern Replacement pattern.
Sequence Processing
-fasta Split sequence into blocks of 70 uppercase letters.
Nucleotide Processing
-revcomp Reverse complement nucleotide sequence.
-nucleic Subrange determines forward or revcomp.
-ncbi2na Expand ncbi2na to IUPAC. (May need to truncate result to actual sequence length.)
-ncbi4na Expand ncbi4na to IUPAC. (May need to truncate result to actual sequence length.)
-cds2prot [-gcode N] [-frame N]
Translate coding region using -gcode and (1-based) -frame (both 1 by default).
Protein Processing
-molwt Calculate molecular weight of peptide.
-molwt-m Molecular weight retaining initial methionine.
-molwt-f Keep initial M residue as formyl-methionine.
-pept Split amino acid runs at *, -, x, or X.
Sequence Coordinates
-0-based element Zero-based.
-1-based element One-based.
-ucsc-based element Half-open.
Command Generator
-insd arg ...
Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as
part of a pipeline. Requires one or more arguments, which may appear in the following order:
Descriptor(s) INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]
Completeness complete/partial
Feature(s) CDS/mRNA/...[,...]
Qualifier(s) INSDFeature_key/"#INSDInterval"/gene/product/feat_location/sub_sequence/... [...]
-insdx Process -insd output table into XML.
Frequency Table
-histogram
Collects data for sort-uniq-count(1) on entire set of records.
Entrez Indexing
-indexer element Positional index using -wrp for field name.
Output Organization
-head str Print before everything else.
-tail str Print after everything else.
-hd str Print before each record.
-tl str Print after each record.
Record Selection
-select condition Select record subset by conditions.
-in filename File of identifiers to use for selection.
Record Rearrangement
-sort[-fwd] element Element to use as sort key.
-sort-rev element Sort records in reverse order.
Reformatting
-format fmt
copy Fast block copy (still applies processing flags).
compact Compress runs of spaces.
flush Suppress line indentation.
indent Indent according to nesting depth.
expand Place each attribute on a separate line.
Validation
-verify Report XML data integrity problems.
-test Check field for visible combining accents and invisible Unicode.
Summary
-outline Display outline of XML structure.
-synopsis Display individual XML paths.
-contour [delimiter] Display XML paths to leaf nodes (delimited by / by default).
Full Exploration Command Precedence
-pattern
-path
-division
-group
-branch
-block
-section
-subset
-unit
Documentation
-help Print usage information and some example argument combinations.
-examples Complete usage examples, involving additional Entrez Direct tools.
-unix Illustrate common Unix command arguments.
-version Print version number.
NOTES
String constraints use case-insensitive comparisons.
Numeric constraints and selection arguments use integer values.
-num and -len selections are synonyms for Object Count (#) and Item Length (%).
-words, -pairs, and -indices convert to lower case.
SEE ALSO
align-columns(1), archive-nihocc(1), archive-nlmnlp(1), archive-nmcds(1), archive-pids(1),
archive-pmc(1), archive-pubmed(1), archive-taxonomy(1), asn2ref(1), between-two-genes(1), bsmp2info(1),
csv2xml(1), custom-index(1), disambiguate-nucleotides(1), download-flatfile(1), download-ncbi-data(1),
ds2pme(1), efetch(1), esample(1), filter-columns(1), find-in-gene(1), fuse-ranges(1), fuse-segments(1),
gbf2facds(1), gbf2fsa(1), gbf2info(1), gbf2tbl(1), gene2range(1), gff2xml(1), gff-sort(1), gm2segs(1),
hgvs2spdi(1), nquire(1), pm-collect(1), pm-refresh(1), pma2apa(1), pma2pme(1), pmc2bioc(1), pmc2info(1),
print-columns(1), rchive(1), refseq-nm-cds(1), reorder-columns(1), snp2hgvs(1), snp2tbl(1), sort-ta‐
ble(1), sort-uniq-count(1), spdi2tbl(1), tbl2prod(1), transmute(1), uniq-table(1), xfetch(1), xfilter(1),
xinfo(1), xlink(1), xml2fsa(1), xml2tbl(1), xsearch(1), xy-plot(1).
NCBI 2025-05-26 XTRACT(1)