Provided by: rdfind_1.6.0-1build3_amd64 bug

NAME

       rdfind - finds duplicate files

SYNOPSIS

       rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...

DESCRIPTION

       rdfind  finds  duplicate  files  across and/or within several directories. It calculates checksum only if
       necessary.  rdfind runs in O(Nlog(N)) time with N being the number of files.

       If two (or more) equal files are found, the program decides which of them is the original  and  the  rest
       are  considered  duplicates.  This  is done by ranking the files to each other and deciding which has the
       highest rank. See section RANKING for details.

       By default, no action is taken besides creating a file with the detected files and showing  the  possible
       amount of saved space.

       If  you  need  better  control over the ranking than given, you can use some preprocessor which sorts the
       file names in desired order and then run the program using xargs. See examples below for how to use  find
       and xargs in conjunction with rdfind.

       To  include files or directories that have names starting with -, use rdfind ./- to not confuse them with
       options.

RANKING

       Given two or more equal files, the one with the highest rank is selected to be the original and the  rest
       are  duplicates.  The  rules of ranking are given below, where the rules are executed from start until an
       original has been found. Given two files A and B which have equal size and content,  the  ranking  is  as
       follows:

       If A was found while scanning an input argument earlier than B, A is higher ranked.

       If A was found at a directory depth lower than B, A is higher ranked (A is closer to the root).

       if  A  and B are found during scanning of the same input argument and share the same directory depth, the
       one that ranks highest depends on if deterministic operation is enabled.  This  is  on  by  default,  see
       option  -deterministic).  If  enabled,  which  one  ranks  highest  is  unspecified but deterministic. If
       disabled, the one that was reported first from the file system is highest ranked.

OPTIONS

       Searching options etc:

       -ignoreempty true|false
              Ignore empty files. Setting this to true (the default) is  equivalent  to  -minsize  1,  false  is
              equivalent to -minsize 0.

       -minsize N
              Ignores files with less than N bytes. Default is 1, meaning empty files are ignored.

       -maxsize N
              Ignores files with N bytes or more. Default is 0, which means this check is disabled.

       -followsymlinks true|false
              Follow symlinks. Default is false.

       -removeidentinode true|false
              Removes items found which have identical inode and device ID. Default is true.

       -checksum md5|sha1|sha256|sha512
              What  type  of checksum to be used: md5, sha1, sha256 or sha512. The default is sha1 since version
              1.4.0.

       -deterministic true|false
              If set (the default), sort files of equal rank in an unspecified  but  deterministic  order.  This
              makes the behaviour independent of in which order files are listed when querying the file system.

       Action options:

       -makesymlinks true|false
              Replace duplicate files with symbolic links. Default is false.

       -makehardlinks true|false
              Replace duplicate files with hard links. Default is false.

       -makeresultsfile true|false
              Make  a  results  file  in  the  current  directory.  Default  is  true. If the file exists, it is
              overwritten. This does not affect whether items are  deleted.  See  -dryrun  for  how  to  disable
              deletions.

       -outputname name
              Make the results file name to be "name" instead of the default results.txt.

       -deleteduplicates true|false
              Delete (unlink) files. Default is false.

       General options:

       -sleep Xms
              Sleeps  X  milliseconds  between  reading each file, to reduce load. Default is 0 (no sleep). Note
              that only a few values are supported at present: 0,1-5,10,25,50,100 milliseconds.

       -n, -dryrun true|false
              Displays what should have been done, don't actually delete or link anything. Default is false.

       -h, -help, --help
              Displays a brief help message.

       -v, -version, --version
              Displays the version number.

EXAMPLES

       Search for duplicate files in the home directory and a backup directory:
              rdfind ~ /mnt/backup

       Delete duplicates in a backup directory:
              rdfind -deleteduplicates true /mnt/backup

       Search for duplicate files in directories called foo:
              find . -type d -name foo -print0 |xargs -0 rdfind

FILES

       results.txt (the default name is results.txt and can be changed with option outputname,  see  above)  The
       results  file  results.txt  will  contain  one  row  per  duplicate  file  found, along with a header row
       explaining the columns.  A text describes why the file is considered a duplicate:

       DUPTYPE_UNKNOWN some internal error

       DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.

       DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in  the  same  input
       argument as the original)

       DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.

ENVIRONMENT

DIAGNOSTICS

EXIT VALUES

       0 on success, nonzero otherwise.

BUGS/FEATURES

       When  specifying  the  same  directory  twice,  it  keeps  the  first  encountered  as the most important
       (original), and the rest as duplicates. This might not be what you want.

       The symlink creates absolute links. This might not be what you want. To create  relative  links  instead,
       you may use the symlink (2) command, which is able to convert absolute links to relative links.

       Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since
       1.3), which might affect user scripts parsing the output file written by rdfind.

SECURITY CONSIDERATIONS

       Avoid  manipulating  the  directories  while  rdfind  is  reading.  rdfind is quite brittle in that case.
       Especially, when deleting or making links, rdfind can be subject to a symlink attack.  Use with care!

AUTHOR

       Paul   Dreik   2006-2018,    reachable    at    rdfind@pauldreik.se    Rdfind    can    be    found    at
       https://rdfind.pauldreik.se/

       Do  you  find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it and
       what data collections they run it on.

THANKS

       Several persons have helped with suggestions and improvements: Niels Möller,  Carl  Payne  and  Salvatore
       Ansani. Thanks also to you who tested the program and sent me feedback.

VERSION

       1.6.0 (release date 2023-06-17)

COPYRIGHT

       This program is distributed under GPLv2 or later, at your option.

SEE ALSO

       md5sum, sha1sum, find, symlinks

Jun 2023                                              1.6.0                                            rdfind(1)