Provided by: datalad_1.1.5-1_all 

NAME
datalad create-sibling-ria - creates a sibling to a dataset in a RIA store
SYNOPSIS
datalad create-sibling-ria [-h] -s NAME [-d DATASET] [--storage-name NAME] [--alias ALIAS] [--post-
update-hook] [--shared {false|true|umask|group|all|world|everybody|0xxx}] [--group GROUP]
[--storage-sibling MODE] [--existing MODE] [--new-store-ok] [--trust-level TRUST-LEVEL] [-r]
[-R LEVELS] [--no-storage-sibling] [--push-url ria+<ssh|file>://<host>[/path]] [--version]
ria+<ssh|file|http(s)>://<host>[/path]
DESCRIPTION
Communication with a dataset in a RIA store is implemented via two siblings. A regular Git remote (repos‐
itory sibling) and a git-annex special remote for data transfer (storage sibling) -- with the former hav‐
ing a publication dependency on the latter. By default, the name of the storage sibling is derived from
the repository sibling's name by appending "-storage".
The store's base path is expected to not exist, be an empty directory, or a valid RIA store.
Notes -----
*RIA URL format*
Interactions with new or existing RIA stores require RIA URLs to identify the store or specific datasets
inside of it.
The general structure of a RIA URL pointing to a store takes the form ``ria+[scheme]://<storelocation>``
(e.g., ``ria+ssh://[user@]hostname:/absolute/path/to/ria-store``, or ``ria+file:///ab‐
solute/path/to/ria-store``)
The general structure of a RIA URL pointing to a dataset in a store (for example for cloning) takes a
similar form, but appends either the datasets UUID or a "~" symbol followed by the dataset's alias name:
``ria+[scheme]://<storelocation>#<dataset-UUID>`` or ``ria+[scheme]://<storelocation>#~<aliasname>``. In
addition, specific version identifiers can be appended to the URL with an additional "@" symbol:
``ria+[scheme]://<storelocation>#<dataset-UUID>@<dataset-version>``, where ``dataset-version`` refers to
a branch or tag.
*RIA store layout*
A RIA store is a directory tree with a dedicated subdirectory for each dataset in the store. The subdi‐
rectory name is constructed from the DataLad dataset ID, e.g. ``124/68afe-59ec-11ea-93d7-f0d5bf7b5561``,
where the first three characters of the ID are used for an intermediate subdirectory in order to mitigate
files system limitations for stores containing a large number of datasets.
By default, a dataset in a RIA store consists of two components: A Git repository (for all dataset con‐
tents stored in Git) and a storage sibling (for dataset content stored in git-annex).
It is possible to selectively disable either component using ``storage-sibling 'off'`` or ``storage-sib‐
ling 'only'``, respectively. If neither component is disabled, a dataset's subdirectory layout in a RIA
store contains a standard bare Git repository and an ``annex/`` subdirectory inside of it. The latter
holds a Git-annex object store and comprises the storage sibling. Disabling the standard git-remote
(``storage-sibling='only'``) will result in not having the bare git repository, disabling the storage
sibling (``storage-sibling='off'``) will result in not having the ``annex/`` subdirectory.
Optionally, there can be a further subdirectory ``archives`` with (compressed) 7z archives of annex ob‐
jects. The storage remote is able to pull annex objects from these archives, if it cannot find in the
regular annex object store. This feature can be useful for storing large collections of rarely changing
data on systems that limit the number of files that can be stored.
Each dataset directory also contains a ``ria-layout-version`` file that identifies the data organization
(as, for example, described above).
Lastly, there is a global ``ria-layout-version`` file at the store's base path that identifies where
dataset subdirectories themselves are located. At present, this file must contain a single line stating
the version (currently "1"). This line MUST end with a newline character.
It is possible to define an alias for an individual dataset in a store by placing a symlink to the
dataset location into an ``alias/`` directory in the root of the store. This enables dataset access via
URLs of format: ``ria+<protocol>://<storelocation>#~<aliasname>``.
Compared to standard git-annex object stores, the ``annex/`` subdirectories used as storage siblings fol‐
low a different layout naming scheme ('dirhashmixed' instead of 'dirhashlower'). This is mostly noted as
a technical detail, but also serves to remind git-annex powerusers to refrain from running git-annex com‐
mands directly in-store as it can cause severe damage due to the layout difference. Interactions should
be handled via the ORA special remote instead.
*Error logging*
To enable error logging at the remote end, append a pipe symbol and an "l" to the version number in
ria-layout-version (like so: ``1|l0`).
Error logging will create files in an "error_log" directory whenever the git-annex special remote (stor‐
age sibling) raises an exception, storing the Python traceback of it. The logfiles are named according to
the scheme ``<dataset id>.<annex uuid of the remote>.log`` showing "who" ran into this issue with which
dataset. Because logging can potentially leak personal data (like local file paths for example), it can
be disabled client-side by setting the configuration variable ``annex.ora-remote.<storage-sib‐
ling-name>.ignore-remote-config``.
OPTIONS
ria+<ssh|file|http(s)>://<host>[/path]
URL identifying the target RIA store and access protocol. If ``--push-url`` is given in addition,
this is used for read access only. Otherwise it will be used for write access too and to create
the repository sibling in the RIA store. Note, that HTTP(S) currently is valid for consumption on‐
ly thus requiring to provide ``--push-url``. Constraints: value must be a string or value must be
NONE
-h, --help, --help-np
show this help message. --help-np forcefully disables the use of a pager for displaying the help
message
-s NAME, --name NAME
Name of the sibling. With RECURSIVE, the same name will be used to label all the subdatasets' sib‐
lings. Constraints: value must be a string or value must be NONE
-d DATASET, --dataset DATASET
specify the dataset to process. If no dataset is given, an attempt is made to identify the dataset
based on the current working directory. Constraints: Value must be a Dataset or a valid identifier
of a Dataset (e.g. a path) or value must be NONE
--storage-name NAME
Name of the storage sibling (git-annex special remote). Must not be identical to the sibling name.
If not specified, defaults to the sibling name plus '-storage' suffix. If only a storage sibling
is created, this setting is ignored, and the primary sibling name is used. Constraints: value must
be a string or value must be NONE
--alias ALIAS
Alias for the dataset in the RIA store. Add the necessary symlink so that this dataset can be
cloned from the RIA store using the given ALIAS instead of its ID. With `recursive=True`, only the
top dataset will be aliased. Constraints: value must be a string or value must be NONE
--post-update-hook
Enable Git's default post-update-hook for the created sibling. This is useful when the sibling is
made accessible via a "dumb server" that requires running 'git update-server-info' to let Git in‐
teract properly with it.
--shared {false|true|umask|group|all|world|everybody|0xxx}
If given, configures the permissions in the RIA store for multi-users access. Possible values for
this option are identical to those of `git init --shared` and are described in its documentation.
Constraints: value must be a string or value must be convertible to type bool or value must be
NONE
--group GROUP
Filesystem group for the repository. Specifying the group is crucial when --shared=group. Con‐
straints: value must be a string or value must be NONE
--storage-sibling MODE
By default, an ORA storage sibling and a Git repository sibling are created (on). Alternatively,
creation of the storage sibling can be disabled (off), or a storage sibling created only and no
Git sibling (only). In the latter mode, no Git installation is required on the target host. Con‐
straints: value must be one of ('only',) or value must be convertible to type bool or value must
be NONE [Default: True]
--existing MODE
Action to perform, if a (storage) sibling is already configured under the given name and/or a tar‐
get already exists. In this case, a dataset can be skipped ('skip'), an existing target repository
be forcefully re-initialized, and the sibling (re-)configured ('reconfigure'), or the command be
instructed to fail ('error'). Constraints: value must be one of ('skip', 'error', 'reconfigure')
[Default: 'error']
--new-store-ok
When set, a new store will be created, if necessary. Otherwise, a sibling will only be created if
the url points to an existing RIA store.
--trust-level TRUST-LEVEL
specify a trust level for the storage sibling. If not specified, the default git-annex trust level
is used. 'trust' should be used with care (see the git-annex-trust man page). Constraints: value
must be one of ('trust', 'semitrust', 'untrust')
-r, --recursive
if set, recurse into potential subdatasets.
-R LEVELS, --recursion-limit LEVELS
limit recursion into subdatasets to the given number of levels. Constraints: value must be con‐
vertible to type 'int' or value must be NONE
--no-storage-sibling
This option is deprecated. Use '--storage-sibling off' instead.
--push-url ria+<ssh|file>://<host>[/path]
URL identifying the target RIA store and access protocol for write access to the storage sibling.
If given this will also be used for creation of the repository sibling in the RIA store. Con‐
straints: value must be a string or value must be NONE
--version
show the module and its version which provides the command
AUTHORS
datalad is developed by The DataLad Team and Contributors <team@datalad.org>.
datalad create-sibling-ria 1.1.5 2025-03-03 datalad create-sibling-ria(1)