Ubuntu Manpage: dcm2xml - Convert DICOM file and data set to XML

NAME

       dcm2xml - Convert DICOM file and data set to XML

SYNOPSIS

       dcm2xml [options] dcmfile-in [xmlfile-out]

DESCRIPTION

       The  dcm2xml  utility  converts  the  contents  of  a  DICOM  file  (file  format or raw data set) to XML
       (Extensible Markup Language). There are two output formats. The first one is specific to DCMTK  with  its
       DTD  (Document  Type  Definition) described in the file dcm2xml.dtd. The second one refers to the 'Native
       DICOM Model' which is specified for the DICOM Application Hosting service found in DICOM part 19.

       If dcm2xml reads a raw data set (DICOM data without a file format meta-header) it will attempt  to  guess
       the  transfer syntax by examining the first few bytes of the file. It is not always possible to correctly
       guess the transfer syntax and it is better to convert a data set  to  a  file  format  whenever  possible
       (using  the  dcmconv  utility). It is also possible to use the -f and -t[ieb] options to force dcm2xml to
       read a data set with a particular transfer syntax.

PARAMETERS

       dcmfile-in   DICOM input filename to be converted ('-' for stdin)

       xmlfile-out  XML output filename (default: stdout)

OPTIONS

   general options
         -h    --help
                 print this help text and exit

               --version
                 print version information and exit

               --arguments
                 print expanded command line arguments

         -q    --quiet
                 quiet mode, print no warnings and errors

         -v    --verbose
                 verbose mode, print processing details

         -d    --debug
                 debug mode, print debug information

         -ll   --log-level  [l]evel: string constant
                 (fatal, error, warn, info, debug, trace)
                 use level l for the logger

         -lc   --log-config  [f]ilename: string
                 use config file f for the logger

   input options
       input file format:

         +f    --read-file
                 read file format or data set (default)

         +fo   --read-file-only
                 read file format only

         -f    --read-dataset
                 read data set without file meta information

       input transfer syntax:

         -t=   --read-xfer-auto
                 use TS recognition (default)

         -td   --read-xfer-detect
                 ignore TS specified in the file meta header

         -te   --read-xfer-little
                 read with explicit VR little endian TS

         -tb   --read-xfer-big
                 read with explicit VR big endian TS

         -ti   --read-xfer-implicit
                 read with implicit VR little endian TS

       long tag values:

         +M    --load-all
                 load very long tag values (e.g. pixel data)

         -M    --load-short
                 do not load very long values (default)

         +R    --max-read-length  [k]bytes: integer (4..4194302, default: 4)
                 set threshold for long values to k kbytes

   processing options
       specific character set:

         +Cr   --charset-require
                 require declaration of extended charset (default)

         +Ca   --charset-assume  [c]harset: string
                 assume charset c if no extended charset declared

         +Cc   --charset-check-all
                 check all data elements with string values
                 (default: only PN, LO, LT, SH, ST, UC and UT)

                 # this option is only used for the extended check whether
                 # the Specific Character Set (0008,0005) attribute should be
                 # present, but not for the conversion of unaffected element
                 # values to UTF-8 (e.g. element values with a VR of CS)

         +U8   --convert-to-utf8
                 convert all element values that are affected
                 by Specific Character Set (0008,0005) to UTF-8

                 # requires support from an underlying character encoding
                 # library (see output of --version on which one is available)

   output options
       general XML format:

         -dtk  --dcmtk-format
                 output in DCMTK-specific format (default)

         -nat  --native-format
                 output in Native DICOM Model format (part 19)

         +Xn   --use-xml-namespace
                 add XML namespace declaration to root element

       DCMTK-specific format (not with --native-format):

         +Xd   --add-dtd-reference
                 add reference to document type definition (DTD)

         +Xe   --embed-dtd-content
                 embed document type definition into XML document

         +Xf   --use-dtd-file  [f]ilename: string
                 use specified DTD file (only with +Xe)
                 (default: /usr/local/share/dcmtk-<VERSION>/dcm2xml.dtd)

         +Wn   --write-element-name
                 write name of the DICOM data elements (default)

         -Wn   --no-element-name
                 do not write name of the DICOM data elements

         +Wb   --write-binary-data
                 write binary data of OB and OW elements
                 (default: off, be careful with --load-all)

       encoding of binary data:

         +Eh   --encode-hex
                 encode binary data as hex numbers
                 (default for DCMTK-specific format)

         +Eu   --encode-uuid
                 encode binary data as a UUID reference
                 (default for Native DICOM Model)

         +Eb   --encode-base64
                 encode binary data as Base64 (RFC 2045, MIME)

DCMTK Format

       The basic structure of the DCMTK-specific XML output created from a DICOM file looks like the following:

       <?xml version='1.0' encoding='ISO-8859-1'?>
       <!DOCTYPE file-format SYSTEM 'dcm2xml.dtd'>
       <file-format xmlns='http://dicom.offis.de/dcmtk'>
         <meta-header xfer='1.2.840.10008.1.2.1' name='Little Endian Explicit'>
           <element tag='0002,0000' vr='UL' vm='1' len='4'
                    name='MetaElementGroupLength'>
             166
           </element>
           ...
           <element tag='0002,0013' vr='SH' vm='1' len='16'
                    name='ImplementationVersionName'>
             OFFIS_DCMTK_353
           </element>
         </meta-header>
         <data-set xfer='1.2.840.10008.1.2' name='Little Endian Implicit'>
           <element tag='0008,0005' vr='CS' vm='1' len='10'
                    name='SpecificCharacterSet'>
             ISO_IR 100
           </element>
           ...
           <sequence tag='0028,3010' vr='SQ' card='2' name='VOILUTSequence'>
             <item card='3'>
               <element tag='0028,3002' vr='xs' vm='3' len='6'
                        name='LUTDescriptor'>
                 256\0\8
               </element>
               ...
             </item>
             ...
           </sequence>
           ...
           <element tag='7fe0,0010' vr='OW' vm='1' len='262144'
                    name='PixelData' loaded='no' binary='hidden'>
           </element>
         </data-set>
       </file-format>

       The 'file-format' and 'meta-header' tags are absent for DICOM data sets.

   XML Encoding
       Attributes with very large value fields (e.g. pixel  data)  are  not  loaded  by  default.  They  can  be
       identified  by  the  additional  attribute 'loaded' with a value of 'no' (see example above). The command
       line option --load-all forces to load all value fields including the very long ones.

       Furthermore, binary data of OB and OW attributes are not written to the XML output file by default. These
       elements can be identified by the additional attribute 'binary' with a  value  of  'hidden'  (default  is
       'no').  The  command  line  option  --write-binary-data  causes  also  binary  value fields to be printed
       (attribute value is 'yes' or 'base64'). But, be careful when using this option together  with  --load-all
       because  of the large amounts of pixel data that might be printed to the output. Please note that in this
       context element values with a VR of OD, OF, OL and OV are not regarded as 'binary data'.

       Multiple values (i.e. where the DICOM value multiplicity is greater than 1) are separated by a  backslash
       '\'  (except  for  Base64  encoded  data).   The  'len'  attribute  indicates the number of bytes for the
       particular value field as stored in the DICOM data set, i.e. it might deviate from the XML encoded  value
       length  e.g.  because  of non-significant padding that has been removed.  If this attribute is missing in
       'sequence' or 'item' start tags, the corresponding DICOM element has been stored with undefined length.

       @section dcm2xml_native_format Native DICOM Model Format

       The description of the  Native  DICOM  Model  format  can  be  found  in  the  DICOM  standard,  part  19
       ('Application Hosting').

       @subsection dcm2xml_bulk_data Bulk Data

       Binary  data,  i.e.  DICOM element values with Value Representations (VR) of OB or OW, as well as OD, OF,
       OL, OV and UN values are by default not written to the XML output because of their  size.   Instead,  for
       each  element,  a new Universally Unique Identifier (UUID) is being generated and written as an attribute
       of a \<BulkData\> XML element.  So far, there is no possibility to write an additional file to  hold  the
       binary  data for each of the binary data chunks.  This is not required by the standard, however, it might
       be useful for implementing an Application Hosting interface; thus this feature may be available in future
       versions of \b dcm2xml.

       In addition, Supplement 163 (Store Over the Web by Representational State Transfer Services) introduces a
       new \<InlineBinary\> XML element that allows for encoding binary data as Base64.  Currently, the  command
       line  option  \e  --encode-base64 enables this encoding for the following VRs: OB, OD, OF, OL, OV, OW and
       UN.

       @subsection dcm2xml_known_issues Known Issues

       In addition to what is written in the above section on 'Bulk Data', there are further known  issues  with
       the current implementation of the Native DICOM Model format.  For example, large element values with a VR
       other  than  OB,  OD, OF, OL, OV, OW or UN are currently never written as bulk data, although it might be
       useful, e.g. for very long text elements (especially UT) or very long numeric fields (of various VRs).

       @section dcm2xml_notes NOTES

       @subsection dcm2xml_character_encoding Character Encoding

       The XML character encoding is determined automatically from the  DICOM  attribute  (0008,0005)  'Specific
       Character Set' using the following mapping:

       @verbatim  ASCII          (ISO_IR  6)     =>  'UTF-8' UTF-8         'ISO_IR 192'  =>  'UTF-8' ISO Latin 1
       'ISO_IR 100'  =>  'ISO-8859-1' ISO Latin 2   'ISO_IR 101'  =>  'ISO-8859-2' ISO Latin  3    'ISO_IR  109'
       =>   'ISO-8859-3'  ISO  Latin  4    'ISO_IR  110'   =>   'ISO-8859-4'  ISO  Latin  5    'ISO_IR  148'  =>
       'ISO-8859-9' ISO Latin 9   'ISO_IR 203'  =>  'ISO-8859-15' Cyrillic      'ISO_IR 144'   =>   'ISO-8859-5'
       Arabic         'ISO_IR  127'   =>   'ISO-8859-6'  Greek          'ISO_IR  126'   =>   'ISO-8859-7' Hebrew
       'ISO_IR 138'  =>  'ISO-8859-8' \endverbatim

       If this DICOM attribute is missing in the input file, although needed, option \e --charset-assume can  be
       used  to  specify  an  appropriate  character  set  manually (using one of the DICOM defined terms).  For
       reasons of backward compatibility with previous versions of this  tool,  the  following  terms  are  also
       supported  and  mapped  automatically  to  the associated DICOM defined terms: latin-1, latin-2, latin-3,
       latin-4, latin-5, latin-9, cyrillic, arabic, greek, hebrew.

       Multiple character sets using code  extension  techniques  are  not  supported.   If  needed,  option  \e
       --convert-to-utf8  can  be  used  to  convert  the  DICOM file or data set to UTF-8 encoding prior to the
       conversion to XML format.  This is also useful for DICOMDIR files where each directory record can have  a
       different character set.

       If  no  mapping  is  defined  and option \e --convert-to-utf8 is not used, non-ASCII characters and those
       below #32 are stored as '&#nnn;' where 'nnn' refers to the numeric character code.  This  might  lead  to
       invalid  character  entity references (such as '&#27;' for ESC) and will cause most XML parsers to reject
       the document.

       @section dcm2xml_logging LOGGING

       The level of logging output of the various command line tools and underlying libraries can  be  specified
       by  the  user.   By  default,  only  errors and warnings are written to the standard error stream.  Using
       option \e --verbose also informational messages like processing details are reported.  Option \e  --debug
       can  be  used  to  get more details on the internal activity, e.g. for debugging purposes.  Other logging
       levels can be selected using option \e --log-level.  In \e --quiet mode only fatal errors  are  reported.
       In  such  very  severe  error  events,  the  application will usually terminate.  For more details on the
       different logging levels, see documentation of module 'oflog'.

       In case the logging output should be written to file (optionally with logfile rotation), to syslog (Unix)
       or the event log (Windows) option \e --log-config can be used.  This configuration file also  allows  for
       directing only certain messages to a particular output stream and for filtering certain messages based on
       the  module  or  application  where  they  are  generated.   An example configuration file is provided in
       <em>\<etcdir\>/logger.cfg</em>.

       @section dcm2xml_command_line COMMAND LINE

       All command line tools use the following notation for parameters: square brackets enclose optional values
       (0-1), three trailing dots indicate that multiple values are allowed (1-n), a combination of both means 0
       to n values.

       Command line options are distinguished from parameters by  a  leading  '+'  or  '-'  sign,  respectively.
       Usually,  order  and  position  of  command  line  options are arbitrary (i.e. they can appear anywhere).
       However, if options are mutually exclusive the rightmost appearance is used.  This behavior  conforms  to
       the standard evaluation rules of common Unix shells.

       In  addition,  one  or  more command files can be specified using an '@' sign as a prefix to the filename
       (e.g. <em>\@command.txt</em>).  Such a command argument is replaced by the content of  the  corresponding
       text  file  (multiple  whitespaces  are  treated  as  a  single  separator unless they appear between two
       quotation marks) prior to any further evaluation.  Please note that a command file cannot contain another
       command file.  This simple but  effective  approach  allows  one  to  summarize  common  combinations  of
       options/parameters  and  avoids  longish  and  confusing  command  lines  (an example is provided in file
       <em>\<datadir\>/dumppat.txt</em>).

       @section dcm2xml_environment ENVIRONMENT

       The \b dcm2xml utility will attempt to load DICOM data  dictionaries  specified  in  the  \e  DCMDICTPATH
       environment  variable.   By default, i.e. if the \e DCMDICTPATH environment variable is not set, the file
       <em>\<datadir\>/dicom.dic</em> will be loaded  unless  the  dictionary  is  built  into  the  application
       (default for Windows).

       The  default  behavior  should  be  preferred  and the \e DCMDICTPATH environment variable only used when
       alternative data dictionaries are required.  The \e DCMDICTPATH environment variable has the same  format
       as  the  Unix  shell  \e  PATH  variable  in that a colon (':') separates entries.  On Windows systems, a
       semicolon (';") is used as a separator. The data dictionary code will attempt to load each file specified
       in the DCMDICTPATH environment variable. It is an error if no data dictionary can be loaded.

       Depending on the command line options specified, the dcm2xml utility will attempt to load  character  set
       mapping  tables. This happens when DCMTK was compiled with the oficonv library (which is the default) and
       the mapping tables are not built into the library (default when DCMTK uses shared libraries).

       The mapping table files are expected in DCMTK's <datadir>. The DCMICONVPATH environment variable  can  be
       used  to  specify  a  different location. If a different location is specified, those mapping tables also
       replace any built-in tables.

FILES

       <datadir>/dcm2xml.dtd - Document Type Definition (DTD) file

COPYRIGHT

       Copyright (C) 2002-2024 by OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany.

Version 3.6.9                               Wed Feb 19 2025 21:30:57                                  dcm2xml(1)

NAME

SYNOPSIS

DESCRIPTION

PARAMETERS

OPTIONS

DCMTK Format

FILES

SEE ALSO

COPYRIGHT