Ubuntu Manpage: datalad foreach-dataset - run a command or Python code on the dataset and/or each of its sub-datasets.

NAME

       datalad foreach-dataset - run a command or Python code on the dataset and/or each of its sub-datasets.

SYNOPSIS


       datalad      foreach-dataset     [-h]     [--cmd-type     {auto|external|exec|eval}]     [-d     DATASET]
              [--state {present|absent|any}] [-r] [-R LEVELS] [--contains  PATH]  [--bottomup]  [-s]  [--output-
              streams  {capture|pass-through|relpath}]  [--chpwd  {ds|pwd}]  [--safe-to-consume {auto|all-subds-
              done|superds-done|always}] [-J NJOBS] [--version] ...

DESCRIPTION

       This command provides a convenience for the cases were no dedicated DataLad command is provided to  oper‐
       ate across the hierarchy of datasets. It is very similar to `git submodule foreach` command with the fol‐
       lowing major differences

       -  by  default  (unless --subdatasets-only) it would include operation on the original dataset as well, -
       subdatasets could be traversed in bottom-up order, - can execute commands in parallel (see JOBS  option),
       but  would account for the order, e.g. in bottom-up order command is executed in super-dataset only after
       it is executed in all subdatasets.

       Additional notes:

       - for execution of "external" commands we use the environment used to execute external git and  git-annex
       commands.

   Command format
       --cmd-type external: A few placeholders are supported in the command via Python format specification:

       -  "{pwd}"  will be replaced with the full path of the current working directory.  - "{ds}" and "{refds}"
       will provide instances of the dataset currently operated on and the reference "context" dataset which was
       provided via ``dataset`` argument.  - "{tmpdir}" will be replaced with the full path of a  temporary  di‐
       rectory.

   Examples
       Aggressively  git clean  all datasets, running 5 parallel jobs::

        % datalad foreach-dataset -r -J 5 git clean -dfx

OPTIONS

       COMMAND
              command  for execution. A leading '--' can be used to disambiguate this command from the preceding
              options to DataLad. For --cmd-type exec or eval only a single command argument  (Python  code)  is
              supported.

       -h, --help, --help-np
              show  this  help message. --help-np forcefully disables the use of a pager for displaying the help
              message

       --cmd-type {auto|external|exec|eval}
              type of the command. EXTERNAL: to be run in a child process using dataset's runner; 'exec': Python
              source code to execute using 'exec(), no value returned; 'eval': Python source  code  to  evaluate
              using  'eval()',  return  value is placed into 'result' field. 'auto': If used via Python API, and
              `cmd` is a Python function, it will use  'eval',  and  otherwise  would  assume  'external'.  Con‐
              straints: value must be one of ('auto', 'external', 'exec', 'eval') [Default: 'auto']

       -d DATASET, --dataset DATASET
              specify  the  dataset  to  operate  on. If no dataset is given, an attempt is made to identify the
              dataset based on the input and/or the current working directory.  Constraints:  Value  must  be  a
              Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE

       --state {present|absent|any}
              indicate which (sub)datasets to consider: either only locally present, absent, or any of those two
              kinds. Constraints: value must be one of ('present', 'absent', 'any') [Default: 'present']

       -r, --recursive
              if set, recurse into potential subdatasets.

       -R LEVELS, --recursion-limit LEVELS
              limit  recursion  into  subdatasets to the given number of levels. Constraints: value must be con‐
              vertible to type 'int' or value must be NONE

       --contains PATH
              limit to the subdatasets containing the given path. If a root path of a subdataset is  given,  the
              last considered dataset will be the subdataset itself. This option can be given multiple times, in
              which  case  datasets  that  contain any of the given paths will be considered. Constraints: value
              must be a string or value must be NONE

       --bottomup
              whether to report subdatasets in bottom-up order along each branch in the dataset  tree,  and  not
              top-down.

       -s, --subdatasets-only
              whether to exclude top level dataset. It is implied if a non-empty CONTAINS is used.

       --output-streams {capture|pass-through|relpath}, --o-s {capture|pass-through|relpath}
              ways  to  handle  outputs.  'capture'  and  return  outputs  from  'cmd'  in the record ('stdout',
              'stderr'); 'pass-through' to the screen (and thus absent from returned record); prefix with  'rel‐
              path'  captured  output  (similar to like grep does) and write to stdout and stderr. In 'relpath',
              relative path is relative to the top of the dataset if DATASET is specified, and if not - relative
              to current directory. Constraints: value must be one  of  ('capture',  'pass-through',  'relpath')
              [Default: 'pass-through']

       --chpwd {ds|pwd}

       --safe-to-consume {auto|all-subds-done|superds-done|always}
              Important only in the case of parallel (jobs greater than 1) execution. 'all-subds-done' instructs
              to  not consider superdataset until command finished execution in all subdatasets (it is the value
              in case of 'auto' if traversal is bottomup). 'superds-done' instructs to not  process  subdatasets
              until command finished in the super-dataset (it is the value in case of 'auto' in traversal is not
              bottom up, which is the default). With 'always' there is no constraint on either to execute in sub
              or  super  dataset.  Constraints:  value must be one of ('auto', 'all-subds-done', 'superds-done',
              'always') [Default: 'auto']

       -J NJOBS, --jobs NJOBS
              how many parallel jobs (where possible) to use. "auto" corresponds to the number defined by 'data‐
              lad.runtime.max-annex-jobs' configuration item NOTE: This option can only  parallelize  input  re‐
              trieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. Con‐
              straints:  value  must  be convertible to type 'int' or value must be NONE or value must be one of
              ('auto',)

       --version
              show the module and its version which provides the command

AUTHORS

        datalad is developed by The DataLad Team and Contributors <team@datalad.org>.

datalad foreach-dataset 1.1.0                      2024-06-14                         datalad foreach-dataset(1)