Provided by: percona-toolkit_3.2.1-1_all bug

NAME

       pt-table-sync - Synchronize MySQL table data efficiently.

SYNOPSIS

       Usage: pt-table-sync [OPTIONS] DSN [DSN]

       pt-table-sync synchronizes data efficiently between MySQL tables.

       This tool changes data, so for maximum safety, you should back up your data before using it.  When
       synchronizing a server that is a replication slave with the "--replicate" or "--sync-to-master" methods,
       it always makes the changes on the replication master, never the replication slave directly.  This is in
       general the only safe way to bring a replica back in sync with its master; changes to the replica are
       usually the source of the problems in the first place.  However, the changes it makes on the master
       should be no-op changes that set the data to their current values, and actually affect only the replica.

       Sync db.tbl on host1 to host2:

         pt-table-sync --execute h=host1,D=db,t=tbl h=host2

       Sync all tables on host1 to host2 and host3:

         pt-table-sync --execute host1 host2 host3

       Make slave1 have the same data as its replication master:

         pt-table-sync --execute --sync-to-master slave1

       Resolve differences that pt-table-checksum found on all slaves of master1:

         pt-table-sync --execute --replicate test.checksum master1

       Same as above but only resolve differences on slave1:

         pt-table-sync --execute --replicate test.checksum \
           --sync-to-master slave1

       Sync master2 in a master-master replication configuration, where master2's copy of db.tbl is known or
       suspected to be incorrect:

         pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl

       Note that in the master-master configuration, the following will NOT do what you want, because it will
       make changes directly on master2, which will then flow through replication and change master1's data:

         # Don't do this in a master-master setup!
         pt-table-sync --execute h=master1,D=db,t=tbl master2

RISKS

       WARNING: pt-table-sync changes data!  Before using this tool, please:

       •   Read the tool's documentation

       •   Review the tool's known "BUGS"

       •   Test the tool on a non-production server

       •   Backup your production server and verify the backups

           pt-table-sync  is  mature,  proven  in the real world, and well tested, but if used improperly it can
           have adverse consequences.  Always test syncing first with "--dry-run" and "--print".

DESCRIPTION

       pt-table-sync does one-way and bidirectional synchronization of table  data.   It  does  not  synchronize
       table structures, indexes, or any other schema objects.  The following describes one-way synchronization.
       "BIDIRECTIONAL SYNCING" is described later.

       This  tool  is  complex  and  functions in several different ways.  To use it safely and effectively, you
       should understand three things: the purpose of "--replicate", finding differences, and specifying  hosts.
       These  three  concepts  are  closely  related  and determine how the tool will run.  The following is the
       abbreviated logic:

          if DSN has a t part, sync only that table:
             if 1 DSN:
                if --sync-to-master:
                   The DSN is a slave.  Connect to its master and sync.
             if more than 1 DSN:
                The first DSN is the source.  Sync each DSN in turn.
          else if --replicate:
             if --sync-to-master:
                The DSN is a slave.  Connect to its master, find records
                of differences, and fix.
             else:
                The DSN is the master.  Find slaves and connect to each,
                find records of differences, and fix.
          else:
             if only 1 DSN and --sync-to-master:
                The DSN is a slave.  Connect to its master, find tables and
                filter with --databases etc, and sync each table to the master.
             else:
                find tables, filtering with --databases etc, and sync each
                DSN to the first.

       pt-table-sync can run in one of two ways: with "--replicate" or without.  The default is to  run  without
       "--replicate"  which  causes  pt-table-sync  to  automatically  find  differences efficiently with one of
       several algorithms (see "ALGORITHMS").  Alternatively, the value of "--replicate", if  specified,  causes
       pt-table-sync  to  use  the differences already found by having previously ran pt-table-checksum with its
       own "--replicate" option.  Strictly speaking, you don't need to use "--replicate"  because  pt-table-sync
       can  find  differences,  but many people use "--replicate" if, for example, they checksum regularly using
       pt-table-checksum then fix differences as needed with pt-table-sync.  If you're unsure, read each  tool's
       documentation carefully and decide for yourself, or consult with an expert.

       Regardless  of  whether "--replicate" is used or not, you need to specify which hosts to sync.  There are
       two ways: with "--sync-to-master" or without.  Specifying "--sync-to-master" makes  pt-table-sync  expect
       one  and only slave DSN on the command line.  The tool will automatically discover the slave's master and
       sync it so that its data is the same as its master.  This is accomplished by making changes on the master
       which then flow through replication and update the slave to resolve its differences.  Be careful  though:
       although  this  option  specifies and syncs a single slave, if there are other slaves on the same master,
       they will receive via replication the changes intended for the slave that you're trying to sync.

       Alternatively, if you do not specify "--sync-to-master", the first DSN given on the command line  is  the
       source  host.   There  is  only ever one source host.  If you do not also specify "--replicate", then you
       must specify at least one other DSN as the destination host.  There can be one or more destination hosts.
       Source and destination hosts must be independent; they cannot be in the same replication  topology.   pt-
       table-sync  will  die  with an error if it detects that a destination host is a slave because changes are
       written directly to destination hosts (and it's not safe to  write  directly  to  slaves).   Or,  if  you
       specify "--replicate" (but not "--sync-to-master") then pt-table-sync expects one and only one master DSN
       on  the  command line.  The tool will automatically discover all the master's slaves and sync them to the
       master.  This is the only way to sync several (all)  slaves  at  once  (because  "--sync-to-master"  only
       specifies one slave).

       Each  host  on  the  command  line  is  specified  as  a  DSN.  The first DSN (or only DSN for cases like
       "--sync-to-master") provides default values for other DSNs, whether those other DSNs are specified on the
       command line or auto-discovered by the tool.  So in this example,

         pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2

       the host2 DSN inherits the "u" and "p" DSN parts from the host1 DSN.  Use the "--explain-hosts" option to
       see how pt-table-sync will interpret the DSNs given on the command line.

LIMITATIONS

       Replicas using row-based replication
           pt-table-sync  requires  statement-based  replication  when  used  with  the  "--sync-to-master"   or
           "--replicate"  option.  Therefore it will set "binlog_format=STATEMENT" on the master for its session
           if required.  To do this user must have "SUPER" privilege.

OUTPUT

       If you specify the "--verbose" option, you'll see information about the differences between  the  tables.
       There is one row per table.  Each server is printed separately.  For example,

         # Syncing h=host1,D=test,t=test1
         # DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
         #      0       0      3      0 Chunk     13:00:00 13:00:17 2    test.test1

       Table  test.test1  on host1 required 3 "INSERT" statements to synchronize and it used the Chunk algorithm
       (see "ALGORITHMS").  The sync operation for this table started at 13:00:00 and  ended  17  seconds  later
       (times taken from "NOW()" on the source host).  Because differences were found, its "EXIT STATUS" was 2.

       If  you  specify  the  "--print"  option,  you'll  see  the actual SQL statements that the script uses to
       synchronize the table if "--execute" is also specified.

       If you want to see the SQL statements that pt-table-sync is using to select chunks, nibbles, rows,  etc.,
       then  specify  "--print"  once  and  "--verbose"  twice.   Be careful though: this can print a lot of SQL
       statements.

       There are cases where no combination of "INSERT", "UPDATE" or "DELETE" statements can resolve differences
       without violating some unique key.  For example, suppose there's a primary key on column a and  a  unique
       key on column b.  Then there is no way to sync these two tables with straightforward UPDATE statements:

        +---+---+  +---+---+
        | a | b |  | a | b |
        +---+---+  +---+---+
        | 1 | 2 |  | 1 | 1 |
        | 2 | 1 |  | 2 | 2 |
        +---+---+  +---+---+

       The  tool  rewrites  queries to "DELETE" and "REPLACE" in this case.  This is automatically handled after
       the first index violation, so you don't have to worry about it.

       Be careful when using pt-table-sync in any master-master setup.  Master-master replication is  inherently
       tricky,  and it's easy to make mistakes.  You need to be sure you're using the tool correctly for master-
       master replication.  See the "SYNOPSIS" for the overview of the correct usage.

       Also be careful with tables that have foreign key constraints with "ON DELETE" or "ON UPDATE" definitions
       because these might cause unintended changes on the child tables.  See "--[no]check-child-tables".

       In general, this tool is best suited when your tables have a primary key or unique  index.   Although  it
       can  synchronize  data  in  tables lacking a primary key or unique index, it might be best to synchronize
       that data by another means.

REPLICATION SAFETY

       Synchronizing a replication master and slave safely is a non-trivial problem, in general.  There are  all
       sorts  of  issues  to  think  about,  such as other processes changing data, trying to change data on the
       slave, whether the destination and source are a master-master pair, and much more.

       In general, the safe way to do it is to change the data on the master, and let the changes  flow  through
       replication  to  the  slave like any other changes.  However, this works only if it's possible to REPLACE
       into the table on the master.  REPLACE works only if there's a unique index on the  table  (otherwise  it
       just acts like an ordinary INSERT).

       If your table has unique keys, you should use the "--sync-to-master" and/or "--replicate" options to sync
       a  slave  to  its  master.   This  will generally do the right thing.  When there is no unique key on the
       table, there is no choice but to change the data on the slave, and pt-table-sync will detect that  you're
       trying   to   do   so.    It   will   complain   and  die  unless  you  specify  "--no-check-slave"  (see
       "--[no]check-slave").

       If you're syncing a table without a primary or unique key on a master-master pair, you  must  change  the
       data  on  the  destination  server.   Therefore,  you  need  to  specify  "--no-bin-log"  for safety (see
       "--[no]bin-log").  If you don't, the changes you make on the destination server will  replicate  back  to
       the source server and change the data there!

       The  generally  safe  thing  to do on a master-master pair is to use the "--sync-to-master" option so you
       don't change the data on the destination server.  You will also need  to  specify  "--no-check-slave"  to
       keep pt-table-sync from complaining that it is changing data on a slave.

ALGORITHMS

       pt-table-sync  has  a generic data-syncing framework which uses different algorithms to find differences.
       The tool automatically chooses the best algorithm for each table based on indexes, column types, and  the
       algorithm  preferences  specified  by  "--algorithms".  The following algorithms are available, listed in
       their default order of preference:

       Chunk
           Finds an index whose first column is numeric  (including  date  and  time  types),  and  divides  the
           column's  range  of values into chunks of approximately "--chunk-size" rows.  Syncs a chunk at a time
           by checksumming the entire chunk.  If the chunk differs on the source and destination, checksums each
           chunk's rows individually to find the rows that differ.

           It is efficient when the column has sufficient cardinality to make the chunks end up about the  right
           size.

           The  initial  per-chunk  checksum  is  quite  small and results in minimal network traffic and memory
           consumption.  If a chunk's rows must be examined, only the primary key columns  and  a  checksum  are
           sent over the network, not the entire row.  If a row is found to be different, the entire row will be
           fetched, but not before.

           Note  that this algorithm will not work if chunking a char column where all the values start with the
           same character. In that case, the tool will exit and suggest picking a different algorithm.

       Nibble
           Finds an index and ascends the index in fixed-size nibbles  of  "--chunk-size"  rows,  using  a  non-
           backtracking  algorithm (see pt-archiver for more on this algorithm).  It is very similar to "Chunk",
           but instead of pre-calculating the boundaries of each piece of the table based on index  cardinality,
           it  uses "LIMIT" to define each nibble's upper limit, and the previous nibble's upper limit to define
           the lower limit.

           It works in steps: one query finds the row that will define the next nibble's upper boundary, and the
           next query checksums the entire nibble.  If the nibble differs between the source and destination, it
           examines the nibble row-by-row, just as "Chunk" does.

       GroupBy
           Selects the entire table grouped by all columns, with a COUNT(*) column added.  Compares all columns,
           and if they're the same, compares the COUNT(*) column's value to determine how many rows to insert or
           delete into the destination.  Works on tables with no primary key or unique index.

       Stream
           Selects the entire table in one big stream and compares all columns.  Selects all columns.  Much less
           efficient than the other algorithms, but works when there is no suitable index for them to use.

       Future Plans
           Possibilities for future algorithms are TempTable (what I  originally  called  bottom-up  in  earlier
           versions  of this tool), DrillDown (what I originally called top-down), and GroupByPrefix (similar to
           how SqlYOG Job Agent works).  Each  algorithm  has  strengths  and  weaknesses.   If  you'd  like  to
           implement  your  favorite  technique  for finding differences between two sources of data on possibly
           different servers, I'm willing to help.  The algorithms adhere to a simple interface  that  makes  it
           pretty easy to write your own.

BIDIRECTIONAL SYNCING

       Bidirectional  syncing  is  a  new, experimental feature.  To make it work reliably there are a number of
       strict limitations:

         * only works when syncing one server to other independent servers
         * does not work in any way with replication
         * requires that the table(s) are chunkable with the Chunk algorithm
         * is not N-way, only bidirectional between two servers at a time
         * does not handle DELETE changes

       For example, suppose we have three servers: c1, r1, r2.  c1 is the central server, a pseudo-master to the
       other servers (viz. r1 and r2 are not slaves to c1).  r1 and r2 are remote servers.  Rows  in  table  foo
       are  updated and inserted on all three servers and we want to synchronize all the changes between all the
       servers.  Table foo has columns:

         id    int PRIMARY KEY
         ts    timestamp auto updated
         name  varchar

       Auto-increment offsets are used so that new rows from any server do not create  conflicting  primary  key
       (id)  values.   In  general,  newer rows, as determined by the ts column, take precedence when a same but
       differing row is found during the bidirectional sync.  "Same but differing" means that two rows have  the
       same  primary  key  (id)  value  but different values for some other column, like the name column in this
       example.  Same but differing conflicts are resolved by a "conflict".  A conflict compares some column  of
       the  competing  rows to determine a "winner".  The winning row becomes the source and its values are used
       to update the other row.

       There are subtle differences between three columns used to achieve bidirectional syncing that you  should
       be  familiar  with:  chunk  column  ("--chunk-column"),  comparison column(s) ("--columns"), and conflict
       column ("--conflict-column").  The chunk column is only used to chunk the table; e.g. "WHERE id >= 5  AND
       id < 10".  Chunks are checksummed and when chunk checksums reveal a difference, the tool selects the rows
       in  that  chunk  and checksums the "--columns" for each row.  If a column checksum differs, the rows have
       one or more conflicting column values.  In a traditional unidirectional sync,  the  conflict  is  a  moot
       point  because  it  can  be  resolved simply by updating the entire destination row with the source row's
       values.   In  a  bidirectional  sync,  however,  the  "--conflict-column"  (in  accordance   with   other
       "--conflict-*"  options  list  below) is compared to determine which row is "correct" or "authoritative";
       this row becomes the "source".

       To sync all three servers completely, two runs of pt-table-sync are required.  The first run syncs c1 and
       r1, then syncs c1 and r2 including any changes from r1.  At this point c1 and r2 are completely in  sync,
       but  r1  is  missing any changes from r2 because c1 didn't have these changes when it and r1 were synced.
       So a second run is needed which syncs the servers in the same order, but this time when  c1  and  r1  are
       synced r1 gets r2's changes.

       The  tool  does not sync N-ways, only bidirectionally between the first DSN given on the command line and
       each subsequent DSN in turn.  So the tool in this example would be ran twice like:

         pt-table-sync --bidirectional h=c1 h=r1 h=r2

       The "--bidirectional" option enables this feature and causes various sanity checks to be performed.   You
       must  specify other options that tell pt-table-sync how to resolve conflicts for same but differing rows.
       These options are:

         * --conflict-column
         * --conflict-comparison
         * --conflict-value
         * --conflict-threshold
         * --conflict-error">  (optional)

       Use "--print" to test this option before "--execute".  The printed  SQL  statements  will  have  comments
       saying on which host the statement would be executed if you used "--execute".

       Technical  side note: the first DSN is always the "left" server and the other DSNs are always the "right"
       server.  Since either server can become the source or destination it's confusing  to  think  of  them  as
       "src"  and  "dst".   Therefore, they're generically referred to as left and right.  It's easy to remember
       this because the first DSN is always to the left of the other server DSNs on the command line.

EXIT STATUS

       The following are the exit statuses (also called return  values,  or  return  codes)  when  pt-table-sync
       finishes and exits.

          STATUS  MEANING
          ======  =======================================================
          0       Success.
          1       Internal error.
          2       At least one table differed on the destination.
          3       Combination of 1 and 2.

OPTIONS

       Specify at least one of "--print", "--execute", or "--dry-run".

       "--where" and "--replicate" are mutually exclusive.

       This  tool  accepts additional command-line arguments.  Refer to the "SYNOPSIS" and usage information for
       details.

       --algorithms
           type: string; default: Chunk,Nibble,GroupBy,Stream

           Algorithm to use when comparing the tables, in order of preference.

           For each table, pt-table-sync will check if the table can be synced with the given algorithms in  the
           order that they're given.  The first algorithm that can sync the table is used.  See "ALGORITHMS".

       --ask-pass
           Prompt for a password when connecting to MySQL.

       --bidirectional
           Enable bidirectional sync between first and subsequent hosts.

           See "BIDIRECTIONAL SYNCING" for more information.

       --[no]bin-log
           default: yes

           Log to the binary log ("SET SQL_LOG_BIN=1").

           Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".

       --buffer-in-mysql
           Instruct MySQL to buffer queries in its memory.

           This  option  adds  the  "SQL_BUFFER_RESULT"  option to the comparison queries.  This causes MySQL to
           execute the queries and place them in a temporary table internally before sending the results back to
           pt-table-sync.  The advantage of this strategy is  that  pt-table-sync  can  fetch  rows  as  desired
           without  using  a lot of memory inside the Perl process, while releasing locks on the MySQL table (to
           reduce contention with other queries).  The disadvantage is that it uses more  memory  on  the  MySQL
           server instead.

           You  probably want to leave "--[no]buffer-to-client" enabled too, because buffering into a temp table
           and then fetching it all into Perl's memory is probably a silly thing to do.   This  option  is  most
           useful for the GroupBy and Stream algorithms, which may fetch a lot of data from the server.

       --[no]buffer-to-client
           default: yes

           Fetch rows one-by-one from MySQL while comparing.

           This  option  enables  "mysql_use_result"  which causes MySQL to hold the selected rows on the server
           until the tool fetches them.  This allows the tool to use less memory but may keep the rows locked on
           the server longer.

           If this option is disabled by specifying "--no-buffer-to-client" then  "mysql_store_result"  is  used
           which  causes  MySQL  to  send all selected rows to the tool at once.  This may result in the results
           "cursor" being held open for a shorter time on the server, but if the tables are large, it could take
           a long time anyway, and use all your memory.

           For most non-trivial data sizes, you want to leave this option enabled.

           This option is disabled when "--bidirectional" is used.

       --channel
           type: string

           Channel name used when connected to a server  using  replication  channels.   Suppose  you  have  two
           masters,  master_a  at  port 12345, master_b at port 1236 and a slave connected to both masters using
           channels chan_master_a and chan_master_b.  If you want to run pt-table-sync to synchronize the  slave
           against master_a, pt-table-sync won't be able to determine what's the correct master since SHOW SLAVE
           STATUS  will  return 2 rows. In this case, you can use --channel=chan_master_a to specify the channel
           name to use in the SHOW SLAVE STATUS command.

       --charset
           short form: -A; type: string

           Default character set.  If the value is utf8, sets Perl's binmode  on  STDOUT  to  utf8,  passes  the
           mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any other
           value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.

       --[no]check-child-tables
           default: yes

           Check  if  "--execute"  will  adversely  affect  child  tables.   When "--replace", "--replicate", or
           "--sync-to-master" is specified, the tool may sync tables using "REPLACE"  statements.   If  a  table
           being synced has child tables with "ON DELETE CASCADE", "ON UPDATE CASCADE", or "ON UPDATE SET NULL",
           the tool prints an error and skips the table because "REPLACE" becomes "DELETE" then "INSERT", so the
           "DELETE" will cascade to the child table and delete its rows.  In the worst case, this can delete all
           rows in child tables!

           Specify "--no-check-child-tables" to disable this check.  To completely avoid affecting child tables,
           also  specify  "--no-foreign-key-checks"  so MySQL will not cascade any operations from the parent to
           child tables.

           This  check  is  only  preformed  if  "--execute"  and  one   of   "--replace",   "--replicate",   or
           "--sync-to-master" is specified.  "--print" does not check child tables.

           The  error  message  only  prints the first child table found with an "ON DELETE CASCADE", "ON UPDATE
           CASCADE", or "ON UPDATE SET NULL" foreign key  constraint.   There  could  be  other  affected  child
           tables.

       --[no]check-master
           default: yes

           With "--sync-to-master", try to verify that the detected master is the real master.

       --[no]check-slave
           default: yes

           Check whether the destination server is a slave.

           If  the  destination  server  is  a  slave,  it's  generally  unsafe to make changes on it.  However,
           sometimes you have to; "--replace" won't work unless there's a unique  index,  for  example,  so  you
           can't make changes on the master in that scenario.  By default pt-table-sync will complain if you try
           to  change  data  on  a slave.  Specify "--no-check-slave" to disable this check.  Use it at your own
           risk.

       --[no]check-triggers
           default: yes

           Check that no triggers are defined on the destination table.

           Triggers were introduced in MySQL v5.0.2, so for older versions this option  has  no  effect  because
           triggers will not be checked.

       --chunk-column
           type: string

           Chunk the table on this column.

       --chunk-index
           type: string

           Chunk the table using this index.

       --chunk-size
           type: string; default: 1000

           Number of rows or data size per chunk.

           The  size  of  each  chunk of rows for the "Chunk" and "Nibble" algorithms.  The size can be either a
           number of rows, or a data size.  Data sizes are specified with a suffix of k=kibibytes,  M=mebibytes,
           G=gibibytes.  Data sizes are converted to a number of rows by dividing by the average row length.

       --columns
           short form: -c; type: array

           Compare this comma-separated list of columns.

       --config
           type: Array

           Read  this  comma-separated  list of config files; if specified, this must be the first option on the
           command line.

       --conflict-column
           type: string

           Compare this column when rows conflict during a "--bidirectional" sync.

           When a same but differing row is found the value of this column from each row is  compared  according
           to  "--conflict-comparison", "--conflict-value" and "--conflict-threshold" to determine which row has
           the correct data and becomes the source.   The  column  can  be  any  type  for  which  there  is  an
           appropriate "--conflict-comparison" (this is almost all types except, for example, blobs).

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more information.

       --conflict-comparison
           type: string

           Choose the "--conflict-column" with this property as the source.

           The  option  affects  how  the  "--conflict-column"  values  from  the conflicting rows are compared.
           Possible comparisons are one of these MAGIC_comparisons:

             newest|oldest|greatest|least|equals|matches

             COMPARISON  CHOOSES ROW WITH
             ==========  =========================================================
             newest      Newest temporal --conflict-column value
             oldest      Oldest temporal --conflict-column value
             greatest    Greatest numerical "--conflict-column value
             least       Least numerical --conflict-column value
             equals      --conflict-column value equal to --conflict-value
             matches     --conflict-column value matching Perl regex pattern
                         --conflict-value

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more information.

       --conflict-error
           type: string; default: warn

           How to report unresolvable conflicts and conflict errors

           This option changes how the user is notified when a conflict cannot be resolved or causes  some  kind
           of error.  Possible values are:

             * warn: Print a warning to STDERR about the unresolvable conflict
             * die:  Die, stop syncing, and print a warning to STDERR

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more information.

       --conflict-threshold
           type: string

           Amount by which one "--conflict-column" must exceed the other.

           The "--conflict-threshold" prevents a conflict from being resolved if the absolute difference between
           the two "--conflict-column" values is less than this amount.  For example, if two "--conflict-column"
           have  timestamp  values  "2009-12-01 12:00:00" and "2009-12-01 12:05:00" the difference is 5 minutes.
           If "--conflict-threshold" is set to "5m" the conflict will be resolved, but if "--conflict-threshold"
           is set to "6m" the conflict will fail to resolve because the difference is not greater than or  equal
           to 6 minutes.  In this latter case, "--conflict-error" will report the failure.

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more information.

       --conflict-value
           type: string

           Use this value for certain "--conflict-comparison".

           This option gives the value for "equals" and "matches" "--conflict-comparison".

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more information.

       --databases
           short form: -d; type: hash

           Sync only this comma-separated list of databases.

           A common request is to sync tables from one database with tables from another database on the same or
           different  server.  This is not yet possible.  "--databases" will not do it, and you can't do it with
           the D part of the DSN either because in the absence of a table  name  it  assumes  the  whole  server
           should be synced and the D part controls only the connection's default database.

       --defaults-file
           short form: -F; type: string

           Only read mysql options from the given file.  You must give an absolute pathname.

       --dry-run
           Analyze, decide the sync algorithm to use, print and exit.

           Implies  "--verbose"  so  you  can  see  the results.  The results are in the same output format that
           you'll see from actually running the tool, but there will  be  zeros  for  rows  affected.   This  is
           because  the  tool  actually  executes, but stops before it compares any data and just returns zeros.
           The zeros do not mean there are no changes to be made.

       --engines
           short form: -e; type: hash

           Sync only this comma-separated list of storage engines.

       --execute
           Execute queries to make the tables have identical data.

           This option makes pt-table-sync actually sync table data by executing all the queries that it created
           to resolve table differences.  Therefore, the tables will be changed!  And unless  you  also  specify
           "--verbose",  the  changes  will  be  made  silently.  If this is not what you want, see "--print" or
           "--dry-run".

       --explain-hosts
           Print connection information and exit.

           Print out a list of hosts to which pt-table-sync  will  connect,  with  all  the  various  connection
           options, and exit.

       --float-precision
           type: int

           Precision for "FLOAT" and "DOUBLE" number-to-string conversion.  Causes FLOAT and DOUBLE values to be
           rounded  to  the  specified  number  of  digits after the decimal point, with the ROUND() function in
           MySQL.  This can help avoid checksum mismatches due to different  floating-point  representations  of
           the same values on different MySQL versions and hardware.  The default is no rounding; the values are
           converted  to  strings by the CONCAT() function, and MySQL chooses the string representation.  If you
           specify a value of 2, for example, then the values 1.008 and 1.009 will be rounded to 1.01, and  will
           checksum as equal.

       --[no]foreign-key-checks
           default: yes

           Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").

           Specifying "--no-foreign-key-checks" will "SET FOREIGN_KEY_CHECKS=0".

       --function
           type: string

           Which hash function you'd like to use for checksums.

           The  default  is  "CRC32".   Other  good choices include "MD5" and "SHA1".  If you have installed the
           "FNV_64" user-defined function, "pt-table-sync" will detect it and prefer to use it,  because  it  is
           much  faster  than the built-ins.  You can also use MURMUR_HASH if you've installed that user-defined
           function.  Both of these are  distributed  with  Percona  Server.   See  pt-table-checksum  for  more
           information and benchmarks.

       --help
           Show help and exit.

       --[no]hex-blob
           default: yes

           "HEX()" "BLOB", "TEXT" and "BINARY" columns.

           When  row  data  from the source is fetched to create queries to sync the data (i.e. the queries seen
           with "--print" and executed by "--execute"), binary columns are wrapped in HEX() so the  binary  data
           does not produce an invalid SQL statement.  You can disable this option but you probably shouldn't.

       --host
           short form: -h; type: string

           Connect to host.

       --ignore-columns
           type: Hash

           Ignore this comma-separated list of column names in comparisons.

           This  option  causes  columns  not to be compared.  However, if a row is determined to differ between
           tables, all columns in that row will be synced, regardless.  (It is not currently possible to exclude
           columns from the sync process itself, only from the comparison.)

       --ignore-databases
           type: Hash

           Ignore this comma-separated list of databases.

           (system databases such as information_schema and performance_schema are ignored by default)

       --ignore-engines
           type: Hash; default: FEDERATED,MRG_MyISAM

           Ignore this comma-separated list of storage engines.

       --ignore-tables
           type: Hash

           Ignore this comma-separated list of tables.

           Table names may be qualified with the database name.

       --ignore-tables-regex
           type: string; group: Filter

           Ignore tables whose names match the Perl regex.

       --[no]index-hint
           default: yes

           Add FORCE/USE INDEX hints to the chunk and row queries.

           By default "pt-table-sync" adds a FORCE/USE INDEX hint to each SQL statement  to  coerce  MySQL  into
           using the index chosen by the sync algorithm or specified by "--chunk-index".  This is usually a good
           thing,  but  in  rare cases the index may not be the best for the query so you can suppress the index
           hint by specifying "--no-index-hint" and let MySQL choose the index.

           This does not affect the queries printed by "--print"; it only affects the chunk and row queries that
           "pt-table-sync" uses to select and compare rows.

       --lock
           type: int

           Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.

           This uses "LOCK TABLES".  This can help prevent tables being changed  while  you're  examining  them.
           The possible values are as follows:

             VALUE  MEANING
             =====  =======================================================
             0      Never lock tables.
             1      Lock and unlock one time per sync cycle (as implemented
                    by the syncing algorithm).  This is the most granular
                    level of locking available.  For example, the Chunk
                    algorithm will lock each chunk of C<N> rows, and then
                    unlock them if they are the same on the source and the
                    destination, before moving on to the next chunk.
             2      Lock and unlock before and after each table.
             3      Lock and unlock once for every server (DSN) synced, with
                    C<FLUSH TABLES WITH READ LOCK>.

           A  replication  slave  is  never locked if "--replicate" or "--sync-to-master" is specified, since in
           theory locking the table on the master should prevent any changes from taking place.   (You  are  not
           changing  data  on  your slave, right?)  If "--wait" is given, the master (source) is locked and then
           the tool waits for the slave to catch up to the master before continuing.

           If "--transaction" is specified, "LOCK TABLES" is not used.  Instead, lock and unlock are implemented
           by beginning and committing transactions.  The exception is if "--lock" is 3.

           If "--no-transaction" is specified, then "LOCK TABLES"  is  used  for  any  value  of  "--lock".  See
           "--[no]transaction".

       --lock-and-rename
           Lock  the  source  and  destination  table, sync, then swap names.  This is useful as a less-blocking
           ALTER TABLE, once the tables are reasonably in  sync  with  each  other  (which  you  may  choose  to
           accomplish  via  any  number of means, including dump and reload or even something like pt-archiver).
           It requires exactly two DSNs and assumes they are on the same server,  so  it  does  no  waiting  for
           replication or the like.  Tables are locked with LOCK TABLES.

       --password
           short form: -p; type: string

           Password  to use when connecting.  If password contains commas they must be escaped with a backslash:
           "exam\,ple"

       --pid
           type: string

           Create the given PID file.  The tool won't start if the PID  file  already  exists  and  the  PID  it
           contains  is different than the current PID.  However, if the PID file exists and the PID it contains
           is no longer running, the tool will overwrite the PID file with the current PID.   The  PID  file  is
           removed automatically when the tool exits.

       --port
           short form: -P; type: int

           Port number to use for connection.

       --print
           Print queries that will resolve differences.

           If  you  don't  trust  "pt-table-sync", or just want to see what it will do, this is a good way to be
           safe.  These queries are valid SQL and you can run them yourself if  you  want  to  sync  the  tables
           manually.

       --recursion-method
           type: array; default: processlist,hosts

           Preferred recursion method used to find slaves.

           Possible methods are:

             METHOD       USES
             ===========  ==================
             processlist  SHOW PROCESSLIST
             hosts        SHOW SLAVE HOSTS
             none         Do not find slaves

           The  processlist  method  is  preferred because SHOW SLAVE HOSTS is not reliable.  However, the hosts
           method is required if the server uses a non-standard port (not 3306).  Usually pt-table-sync does the
           right thing and finds the slaves, but you may give a preferred method and it will be used first.   If
           it doesn't find any slaves, the other methods will be tried.

       --replace
           Write all "INSERT" and "UPDATE" statements as "REPLACE".

           This is automatically switched on as needed when there are unique index violations.

       --replicate
           type: string

           Sync tables listed as different in this table.

           Specifies  that  "pt-table-sync"  should  examine the specified table to find data that differs.  The
           table is exactly the same as the argument of  the  same  name  to  pt-table-checksum.   That  is,  it
           contains records of which tables (and ranges of values) differ between the master and slave.

           For  each  table  and  range  of  values  that  shows  differences  between  the  master  and  slave,
           "pt-table-checksum" will sync that table, with the appropriate "WHERE" clause, to its master.

           This automatically sets "--wait" to 60 and causes changes to be made on the  master  instead  of  the
           slave.

           If  "--sync-to-master"  is specified, the tool will assume the server you specified is the slave, and
           connect to the master as usual to sync.

           Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of the server you specified.   If  it
           is unable to find any slaves via "SHOW PROCESSLIST", it will inspect "SHOW SLAVE HOSTS" instead.  You
           must  configure  each  slave's "report-host", "report-port" and other options for this to work right.
           After finding slaves, it will inspect the specified table on each slave to find data that needs to be
           synced, and sync it.

           The tool examines the master's copy of the table first, assuming that the  master  is  potentially  a
           slave  as  well.   Any  table  that  shows differences there will NOT be synced on the slave(s).  For
           example, suppose your replication is set up as A->B, B->C, B->D.  Suppose you use this  argument  and
           specify  server  B.  The tool will examine server B's copy of the table.  If it looks like server B's
           data in table "test.tbl1" is different from server A's copy, the tool will not  sync  that  table  on
           servers C and D.

       --slave-user
           type: string

           Sets  the  user  to  be used to connect to the slaves.  This parameter allows you to have a different
           user with less privileges on the slaves but that user must exist on all slaves.

       --slave-password
           type: string

           Sets the password to be used to connect to the slaves.  It can be  used  with  --slave-user  and  the
           password for the user must be the same on all slaves.

       --set-vars
           type: Array

           Set the MySQL variables in this comma-separated list of "variable=value" pairs.

           By default, the tool sets:

              wait_timeout=10000

           Variables specified on the command line override these defaults.  For example, specifying "--set-vars
           wait_timeout=500" overrides the defaultvalue of 10000.

           The tool prints a warning and continues if a variable cannot be set.

       --socket
           short form: -S; type: string

           Socket file to use for connection.

       --sync-to-master
           Treat the DSN as a slave and sync it to its master.

           Treat  the  server  you  specified  as a slave.  Inspect "SHOW SLAVE STATUS", connect to the server's
           master, and treat the master as the source and the slave as the destination.  Causes  changes  to  be
           made  on  the  master.   Sets  "--wait" to 60 by default, sets "--lock" to 1 by default, and disables
           "--[no]transaction" by default.  See also "--replicate", which changes this option's behavior.

       --tables
           short form: -t; type: hash

           Sync only this comma-separated list of tables.

           Table names may be qualified with the database name.

       --timeout-ok
           Keep going if "--wait" fails.

           If you specify "--wait" and the slave doesn't catch up to the master's position before the wait times
           out, the default behavior is to abort.  This option makes the tool keep going  anyway.   Warning:  if
           you  are  trying  to  get a consistent comparison between the two servers, you probably don't want to
           keep going after a timeout.

       --[no]transaction
           Use transactions instead of "LOCK TABLES".

           The granularity of beginning and committing transactions is controlled by "--lock".  This is  enabled
           by default, but since "--lock" is disabled by default, it has no effect.

           Most  options  that  enable  locking  also  disable  transactions  by  default, so if you want to use
           transactional locking (via "LOCK IN SHARE MODE" and "FOR UPDATE", you  must  specify  "--transaction"
           explicitly.

           If  you  don't  specify  "--transaction"  explicitly "pt-table-sync" will decide on a per-table basis
           whether to use transactions or table locks.  It currently uses transactions  on  InnoDB  tables,  and
           table locks on all others.

           If  "--no-transaction"  is specified, then "pt-table-sync" will not use transactions at all (not even
           for InnoDB tables) and locking is controlled by "--lock".

           When enabled, either explicitly or implicitly, the transaction isolation  level  is  set  "REPEATABLE
           READ" and transactions are started "WITH CONSISTENT SNAPSHOT".

       --trim
           "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes.  Helps when comparing MySQL 4.1 to >= 5.0.

           This  is useful when you don't care about the trailing space differences between MySQL versions which
           vary in their handling of trailing spaces.  MySQL  5.0  and  later  all  retain  trailing  spaces  in
           "VARCHAR", while previous versions would remove them.

       --[no]unique-checks
           default: yes

           Enable unique key checks ("SET UNIQUE_CHECKS=1").

           Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".

       --user
           short form: -u; type: string

           User for login if not current user.

       --verbose
           short form: -v; cumulative: yes

           Print results of sync operations.

           See "OUTPUT" for more details about the output.

       --version
           Show version and exit.

       --[no]version-check
           default: yes

           Check for the latest version of Percona Toolkit, MySQL, and other programs.

           This  is  a standard "check for updates automatically" feature, with two additional features.  First,
           the tool checks its own version and also the versions of the following  software:  operating  system,
           Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver for Perl (DBD::mysql), and Percona
           Toolkit.  Second,  it  checks  for  and  warns about versions with known problems. For example, MySQL
           5.5.25 had a critical bug and was re-released as 5.5.25a.

           A secure connection to Percona’s Version Check database server is done to perform these checks.  Each
           request  is  logged  by  the  server, including software version numbers and unique ID of the checked
           system. The ID is generated by the Percona Toolkit installation script  or  when  the  Version  Check
           database call is done for the first time.

           Any  updates  or  known problems are printed to STDOUT before the tool's normal output.  This feature
           should never interfere with the normal operation of the tool.

           For more information, visit <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.

       --wait
           short form: -w; type: time

           How long to wait for slaves to catch up to their master.

           Make the master wait for the slave to catch up in replication before comparing the tables.  The value
           is the number of seconds to wait before timing out (see also "--timeout-ok").  Sets "--lock" to 1 and
           "--[no]transaction" to 0 by default.  If you see an error such as the following,

             MASTER_POS_WAIT returned -1

           It means the timeout was exceeded and you need to increase it.

           The default value of this option is influenced by other options.  To see what value is in effect, run
           with "--help".

           To disable waiting entirely (except for locks), specify "--wait" 0.  This helps  when  the  slave  is
           lagging on tables that are not being synced.

       --where
           type: string

           "WHERE" clause to restrict syncing to part of the table.

       --[no]zero-chunk
           default: yes

           Add a chunk for rows with zero or zero-equivalent values.  The only has an effect when "--chunk-size"
           is  specified.  The purpose of the zero chunk is to capture a potentially large number of zero values
           that would imbalance the size of the first chunk.  For example, if a lot  of  negative  numbers  were
           inserted  into  an unsigned integer column causing them to be stored as zeros, then these zero values
           are captured by the zero chunk instead of the first chunk and all its non-zero values.

DSN OPTIONS

       These DSN options are used to create a DSN.  Each option is given like "option=value".  The  options  are
       case-sensitive,  so  P and p are not the same option.  There cannot be whitespace before or after the "="
       and if the value contains whitespace it must be  quoted.   DSN  options  are  comma-separated.   See  the
       percona-toolkit manpage for full details.

       •   A

           dsn: charset; copy: yes

           Default character set.

       •   D

           dsn: database; copy: yes

           Database containing the table to be synced.

       •   F

           dsn: mysql_read_default_file; copy: yes

           Only read default options from the given file

       •   h

           dsn: host; copy: yes

           Connect to host.

       •   p

           dsn: password; copy: yes

           Password  to use when connecting.  If password contains commas they must be escaped with a backslash:
           "exam\,ple"

       •   P

           dsn: port; copy: yes

           Port number to use for connection.

       •   S

           dsn: mysql_socket; copy: yes

           Socket file to use for connection.

       •   t

           copy: yes

           Table to be synced.

       •   u

           dsn: user; copy: yes

           User for login if not current user.

ENVIRONMENT

       The environment variable "PTDEBUG" enables verbose debugging output to STDERR.  To enable  debugging  and
       capture all output to a file, run the tool like:

          PTDEBUG=1 pt-table-sync ... > FILE 2>&1

       Be careful: debugging output is voluminous and can generate several megabytes of output.

SYSTEM REQUIREMENTS

       You  need  Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new
       version of Perl.

BUGS

       For a list of known bugs, see <http://www.percona.com/bugs/pt-table-sync>.

       Please report bugs at <https://jira.percona.com/projects/PT>.  Include the following information in  your
       bug report:

       •   Complete command-line used to run the tool

       •   Tool "--version"

       •   MySQL version of all servers involved

       •   Output from the tool including STDERR

       •   Input files (log/dump/config files, etc.)

       If possible, include debugging output by running the tool with "PTDEBUG"; see "ENVIRONMENT".

DOWNLOADING

       Visit  <http://www.percona.com/software/percona-toolkit/>  to  download  the  latest  release  of Percona
       Toolkit.  Or, get the latest release from the command line:

          wget percona.com/get/percona-toolkit.tar.gz

          wget percona.com/get/percona-toolkit.rpm

          wget percona.com/get/percona-toolkit.deb

       You can also get individual tools from the latest release:

          wget percona.com/get/TOOL

       Replace "TOOL" with the name of any tool.

AUTHORS

       Baron Schwartz

ACKNOWLEDGMENTS

       My   work   is   based   in   part   on    Giuseppe    Maxia's    work    on    distributed    databases,
       <http://www.sysadminmag.com/articles/2004/0408/>  and  code  derived  from  that  article.  There is more
       explanation, and a link to the code, at <http://www.perlmonks.org/?node_id=381053>.

       Another programmer extended Maxia's work even further.  Fabien Coelho  changed  and  generalized  Maxia's
       technique,  introducing  symmetry and avoiding some problems that might have caused too-frequent checksum
       collisions.  This work grew  into  pg_comparator,  <http://www.coelho.net/pg_comparator/>.   Coelho  also
       explained   the   technique   further   in   a  paper  titled  "Remote  Comparison  of  Database  Tables"
       (<http://cri.ensmp.fr/classement/doc/A-375.pdf>).

       This existing literature mostly addressed how to find the differences between  the  tables,  not  how  to
       resolve  them  once  found.   I  needed  a tool that would not only find them efficiently, but would then
       resolve them.  I first began thinking about  how  to  improve  the  technique  further  with  my  article
       <http://tinyurl.com/mysql-data-diff-algorithm>,   where  I  discussed  a  number  of  problems  with  the
       Maxia/Coelho "bottom-up" algorithm.  After writing that article, I began to write this tool.  I wanted to
       actually implement their algorithm with some improvements so I was sure I understood  it  completely.   I
       discovered  it  is  not what I thought it was, and is considerably more complex than it appeared to me at
       first.  Fabien Coelho was kind enough to address some questions over email.

       The first versions of this tool implemented a version of  the  Coelho/Maxia  algorithm,  which  I  called
       "bottom-up",  and my own, which I called "top-down."  Those algorithms are considerably more complex than
       the current algorithms and I have removed them from  this  tool,  and  may  add  them  back  later.   The
       improvements  to  the  bottom-up  algorithm  are  my  original  work,  as is the top-down algorithm.  The
       techniques to actually resolve the differences are also my own work.

       Another tool that can synchronize tables is the SQLyog Job Agent from webyog.  Thanks to  Rohit  Nadhani,
       SJA's author, for the conversations about the general techniques.  There is a comparison of pt-table-sync
       and SJA at <http://tinyurl.com/maatkit-vs-sqlyog>

       Thanks to the following people and organizations for helping in many ways:

       The  Rimm-Kaufman  Group  <http://www.rimmkaufman.com/>,  MySQL  AB  <http://www.mysql.com/>,  Blue Ridge
       InternetWorks <http://www.briworks.com/>,  Percona  <http://www.percona.com/>,  Fabien  Coelho,  Giuseppe
       Maxia and others at MySQL AB, Kristian Koehntopp (MySQL AB), Rohit Nadhani (WebYog), The helpful monks at
       Perlmonks, And others too numerous to mention.

ABOUT PERCONA TOOLKIT

       This  tool is part of Percona Toolkit, a collection of advanced command-line tools for MySQL developed by
       Percona.  Percona Toolkit was forked from two  projects  in  June,  2011:  Maatkit  and  Aspersa.   Those
       projects  were  created  by  Baron  Schwartz  and  primarily  developed by him and Daniel Nichter.  Visit
       <http://www.percona.com/software/> to learn about other free, open-source software from Percona.

COPYRIGHT, LICENSE, AND WARRANTY

       This program is copyright 2011-2018 Percona LLC and/or its affiliates, 2007-2011 Baron Schwartz.

       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT  ANY  EXPRESS  OR  IMPLIED  WARRANTIES,  INCLUDING,  WITHOUT
       LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

       This  program  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU
       General Public License as published by the Free Software Foundation, version  2;  OR  the  Perl  Artistic
       License.   On  UNIX  and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these
       licenses.

       You should have received a copy of the GNU General Public License along with this program; if not,  write
       to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

VERSION

       pt-table-sync 3.2.1

POD ERRORS

       Hey! The above document had some coding errors, which are explained below:

       Around line 12823:
           Non-ASCII character seen before =encoding in 'Percona’s'. Assuming UTF-8

perl v5.30.3                                       2020-08-30                                  PT-TABLE-SYNC(1p)