Ubuntu Manpage: goto-analyzer - Data-flow analysis for C programs and goto binaries

NAME

       goto-analyzer - Data-flow analysis for C programs and goto binaries

SYNOPSIS

       goto-analyzer [-?|-h|--help]

       goto-analyzer --version

       goto-analyzer [options] file.c|file.gb

DESCRIPTION

       goto-analyzer  is an abstract interpreter which uses the same front-end and GOTO binary representation as
       cbmc(1).

       The key difference is that cbmc(1) under-approximates the behavior of the program (execution traces  that
       are  too  long  or  require  too  many  loop  unwindings  are  not  considered) while goto-analyzer over-
       approximates the behavior of the program.  cbmc(1) can determine if a property is A. true for  a  bounded
       number  of iterations or B.  false and giving an error trace.  In contrast goto-analyzer can determine if
       a property is A. true for all iterations or B. possibly false.  In this sense,  each  tool  has  its  own
       strengths and weaknesses.

       To use goto-analyzer you need to give options for:

              What task to perform after the abstract interpreter has run.

              How to format the output.

              Which abstract interpreter is used.

              Which domain is used to describe the state of the program at a point during execution.

              How the history of the control flow of the program determines the number of points of execution.

              The storage that links points of execution and domains.

OPTIONS

Task options:
goto-analyzer first runs the abstract interpreter until it reaches a fix-point, then it will perform the
task the user has chosen.

--show Displays a domain for every instruction in the GOTO binary. The format and information will
depend on the domain that has been selected. If there are multiple domains corresponding to the
same location (see history below) these will be merged before they are displayed.

--show-on-source
The source code of the program is displayed line-by-line with the abstract domains corresponding
to each location displayed between them. As the analysis is done at the level of instructions in
the GOTO binary, some domains may not be displayed. Also if parts of the GOTO binary have been
generated or manipulated by other tools, these may not be displayed as there is no corresponding
source. --show-on-source is the more user-friendly output, but --show gives a better picture of
exactly what is computed.

--verify
Every property in the program is checked to see whether it is true (it always holds), unreachable,
false if it is reachable (due to the over-approximate analysis, it is not clear if locations are
reachable or if it is an overapproximation, so this is the best that can be achieved) or unknown.
If there are multiple points of execution that reach the same location, each will be checked and
the answers combined, with unknown taking precedence.

--simplify file_name
Writes a new version of the input program to file_name in which the program has been simplified
using information from the abstract interpreter. The exact simplification will depend on the
domain that is used but typically this might be replacing any expression that has a constant
value. If this makes instructions unreachable (for example if GOTO can be shown to never be
taken) they will be removed. Removal can be deactivated by passing --no-simplify-slicing. In the
ideal world simplify would be idempotent (i.e. running it a second time would not simplify
anything more than the first). However there are edge cases which are difficult or prohibitively
expensive to handle in the domain which can result in a second (or more) runs giving
simplification. Submitting bug reports for these is helpful but they may not be viable to fix.

--no-simplify-slicing
Do not remove instructions from which no property can be reached (use with --simplify).

--unreachable-instructions
Lists which instructions have a domain which is bottom (i.e. unreachable). If --function has been
used to set the program entry point then this can flag things like the main function as
unreachable.

--unreachable-functions
Similar to --unreachable-instructions, but reports which functions are definitely unreachable
rather than just instructions.

--reachable-functions
The negation of --unreachable-functions, reports which functions may be reachable. Note that
because the analysis is over-approximate, it is possible this will mark functions as reachable
when a more precise analysis (possibly using cbmc(1)) will show that there are no execution traces
that reach them.

Abstract interpreter options:
These options control which abstract interpreter is used and how the analysis is performed. In principle
this can significantly change the accuracy and performance of goto-analyzer but the current options are
reasonably similar.

If --verbosity is set above 8 the abstract interpreter will log what it is doing. This is intended to
aid developers in understanding how the algorithms work, where time is being spent, etc. but can be
generally quite instructive.

--legacy-ait
This is the default option. Abstract interpretation is performed eagerly from the start of the
program until fixed-point is reached. Functions are analyzed as needed and in the order of that
they are reached. This option also fixes the history and storage options to their defaults.

--recursive-interprocedural
This extends --legacy-ait by allowing the history and storage to be configured. As the name
implies, function calls are handled by recursion within the interpreter. This is a good all-round
choice and will likely become the default at some point in the future.

--three-way-merge
This extends --recursive-interprocedural by performing a "modification aware" merge after function
calls. At the time of writing only --vsd supports the necessary differential reasoning. If you
are using --vsd this is recommended as it is more accurate with little extra cost.

--legacy-concurrent
This extends --legacy-ait with very restricted and special purpose handling of threads. This
needs the domain to have certain unusual properties for it to give a correct answer. At the time
of writing only --dependence-graph is compatible with it.

--location-sensitive
Use location-sensitive abstract interpreter.

History options:
To over-approximate what a program does, it is necessary to consider all of the paths of execution
through the program. As there are a potentially infinite set of traces (and they can be potentially
infinitely long) it is necessary to merge some of them. The common approach (the "collecting
abstraction") is to merge all paths that reach the same instruction. The abstract interpretation is then
done between instructions without thinking about execution paths. This ensures termination but means
that it is not possible to distinguish different call sites, loop iterations or paths through a program.

Note that --legacy-ait, the default abstract interpreter fixes the history to --ahistorical so you will
need to choose another abstract interpreter to make use of these options.

--ahistorical
This is the default and the coarsest abstraction. No history information is kept, so all traces
that reach an instruction are merged. This is the collecting abstraction that is used in most
abstract interpreters.

--call-stack n
This is an inter-procedural abstraction; it tracks the call stack and only merges traces that
reach the same location and have the same call stack. The effect of this is equivalent to
inlining all functions and then using --ahistorical. In larger programs this can be very
expensive in terms of both time and memory but can give much more accurate results. Recursive
functions create a challenge as the call stack will be different each time. To prevent non-
termination, the parameter n limits how many times a loop of recursive functions can be called.
When n is reached all later ones will be merged. Setting this to 0 will disable the limit.

--loop-unwind n
This tracks the backwards jumps that are taken in the current function. Traces that reach the
same location are merged if their history of backwards jumps is the same. At most n traces are
kept for each location, after that they are merged regardless of whether their histories match.
This gives a similar effect to unrolling the loops n times and then using --ahistorical. In the
case of nested loops, the behavior can be a little different to unrolling as the limit is the
number of times a location is reached, so a loop with x iterations containing a loop with y
iterations will require n = x*y. The time and memory taken by this option will rise (at worst)
linearly in terms of n. If n is 0 then there is no limit. Be warned that if there are loops that
can execute an unbounded number of iterations or if the domain is not sufficiently precise to
identify the termination conditions then the analysis will not terminate.

--branching n
This works in a similar way to --loop-unwind but tracking forwards jumps (if, switch, goto, etc.)
rather than backwards ones. This gives per-path analysis but limiting the number of times each
location is visited. There is not a direct form of program transformation that matches this but
it is similar to the per-path analysis that symbolic execution does. The scalability and the risk
of non-termination if n is 0 remain the same. Note that the goto-programs generated by various
language front-ends have a conditional forwards jump to exit the loop if the condition fails at
the start and an unconditional backwards jump at the end. This means that --branching can wind up
distinguishing different loop iterations — "has not exited for the last 3 iterations" rather than
"has jumped back to the top 3 times".

--loop-unwind-and-branching n
Again, this is similar to --loop-unwind but tracks both forwards and backwards jumps. This is
only a very small amount more expensive than --branching and is probably the best option for
detailed analysis of each function.

Domain options:
These control how the possible states at a given execution point are represented and manipulated.

--dependence-graph
Tracks data flow and information flow dependencies between instructions and produces a graph.
This includes doing points-to analysis and tracking reaching definitions (i.e. use-def chains).
This is one of the most extensive, correct and feature complete domains.

--vsd, --variable-sensitivity
This is the Variable Sensitivity Domain (VSD). It is a non-relational domain that stores an
abstract object for each live variable. Which kind of abstract objects are used depends on the
type of the variable and the run-time configuration. This means that sensitivity of the domain
can be chosen — for example, do you want to track every element of an array independently, or just
a few of them or simply ignore arrays all together. A set of options to configure VSD are given
below. This domain is extensive and does not have any known architectural limits on correctness.
As such it is a good choice for many kinds of analysis.

--dependence-graph-vs
This is a variant of the dependence graph domain that uses VSD to do the foundational pointer and
reaching definitions analysis. This means it can be configured using the VSD options and give
more precise analysis (for example, field aware) of the dependencies.

--constants
The default option, this stores one constant value per variable. This means it is fast but will
only find things that can be resolved by constant propagation. The domain has some handling of
arrays but limited support for pointers which means that in can potentially give unsound behavior.
--vsd --vsd-values constants is probably a better choice for this kind of analysis.

--intervals
A domain that stores an interval for each integer and float variable. At the time of writing not
all operations are supported so the results can be quite over-approximate at points. It also has
limitations in the handling of pointers so can give unsound results. --vsd --vsd-values intervals
is probably a better choice for this kind of analysis.

--non-null
This domain is intended to find which pointers are not null. Its implementation is very limited
and it is not recommended.

Variable sensitivity domain (VSD) options:
VSD has a wide range of options that allow you to choose what kind of abstract objects (and thus
abstractions) are used to represent variables of each type.

--vsd-values [constants|intervals|set-of-constants]
This controls the abstraction used for values, both int and float. The default option is
constants which tracks if the variable has a constant value. This is fast but not very precise so
it may well be unable to prove very much. intervals uses an interval that contains all of the
possible values the variable can take. It is more precise than constants in all cases but a bit
slower. It is good for numerical code. set-of-constants uses a set of up to 10 (currently fixed)
constants. This is more general than using a single constant but can make analysis up to 10 times
(or in rare cases 100 times) slower. It is good for control code with flags and modes.

--vsd-structs [top-bottom|every-field]
This controls how structures are handled. The default is top-bottom which uses an abstract object
with just two states (top and bottom). In effect writes to structures are ignored and reads from
them will always return top (any value). The other alternative is every-field which stores an
abstract object for each field. Depending on how many structures are live at any one time and how
many fields they have this may increase the amount of memory used by goto-analyzer by a reasonable
amount. But this means that the analysis will be field-sensitive.

--vsd-arrays [top-bottom|smash|up-to-n-elements|every-element]
This controls how arrays are handled. As with structures, the default is top-bottom which
effectively ignores writes to the array and returns top on a read. More precise than this is
smash which stores one abstract element for all of the values. This is relatively cheap but a lot
more precise, particularly if used with intervals or set-of-constants. up-to-n-elements
generalizes smash by storing abstract objects for the first n elements of each array (n defaults
to 10 and can be controlled by --vsd-array-max-elements) and then condensing all other elements
down to a single abstract object. This allows reasonably fine-grained control over the amount of
memory used and can give much more precise results for small arrays. every-element is the most
precise, but most expensive option where an abstract element is stored for every entry in the
array.

--vsd-array-max-elements
Configures the value of n in --vsd-arrays up-to-n-elements and defaults to 10 if not given.

--vsd-pointers [top-bottom|constants|value-set]
This controls the handling of pointers. The default, top-bottom effectively ignores pointers,
this is OK if they are just read (all reads return top) but if they are written then there is the
problem that we know that a variable is changed but we don't know which one, so we have to set the
whole domain to top. constants is somewhat misleadingly named as it uses an abstract object that
tracks a pointer to a single variable. This includes the offset within the variable; a stack of
field names for structs and abstract objects for offsets in arrays. Offsets are tracked even if
the abstract object for the variable itself does not distinguish different fields or indexes.
value-set is the most precise option; it stores a set of pointers to single variables as described
above.

--vsd-unions top-bottom
At the time of writing there is only one option for unions which is top-bottom, discarding writes
and returning top for all reads from a variable of union type.

--vsd-data-dependencies
Wraps each abstract object with a set of locations where the variable was last modified. The set
is reset when the variable is written and takes the union of the two sides' sets on merge. This
was originally intended for --dependence-graph-vs but has proved useful for --vsd as well. This
is not strictly necessary for --three-way-merge as the mechanism it uses to work out which
variables have changed is independent of this option.

--vsd-liveness
Wraps each abstract object with the location of the last assignment or merge. This is more basic
and limited than --vsd-data-dependencies and is intended to track SSA-like regions of variable
liveness.

--vsd-flow-insensitive
This does not alter the abstract objects used or their configuration. It disables the reduction
of the domain when a branch is taken or an assumption is reached. This normally gives a small
saving in time but at the cost of a large amount of precision. This is why the default is to do
the reduction. It can be useful for debugging issues with the reduction.

Storage options:
The histories described above are used to keep track of where in the computation needs to be explored.
The most precise option is to keep one domain for every history but in some cases, to save memory and
time, it may be desirable to share domains between histories. The storage options allow this kind of
sharing.

--one-domain-per-location
This is the default option. All histories that reach the same location will use the same domain.
Setting this means that the results of other histories will be similar to setting --ahistorical.
One difference is how and when widening occurs: --one-domain-per-location --loop-unwind n will
wait until n iterations of a loop have been completed and then will start to widen.

--one-domain-per-history
This is the best option to use if you are using a history other than --ahistorical. It stores one
domain per history which can result in a significant increase in the amount of memory used.

Output options:
These options control how the result of the task is output. The default is text to the standard output.
In the case of tasks that produce goto-programs (--simplify for example), the output options only affect
the logging and not the final form of the program.

--text file_name
Output results in plain text to given file.

--json file_name
Writes the output as a JSON object to file_name.

--xml file_name
Output results in XML format to file_name.

--dot file_name
Writes the output in dot(1) format to file_name. This is only supported by some domains and tasks
(for example --show --dependence-graph).

Specific analyses:
--taint file_name
perform taint analysis using rules in given file

--show-taint
print taint analysis results on stdout

--show-local-may-alias
perform procedure-local may alias analysis

C/C++ frontend options:
-I path
set include path (C/C++)

--include file
set include file (C/C++)

-D macro
define preprocessor macro (C/C++)

--c89, --c99, --c11
set C language standard (default: c11)

--cpp98, --cpp03, --cpp11
set C++ language standard (default: cpp98)

--unsigned-char
make "char" unsigned by default

--round-to-nearest, --round-to-even
rounding towards nearest even (default)

--round-to-plus-inf
rounding towards plus infinity

--round-to-minus-inf
rounding towards minus infinity

--round-to-zero
rounding towards zero

--no-library
disable built-in abstract C library

--function name
set main function name

Platform options:
--arch arch
Set analysis architecture, which defaults to the host architecture. Use one of: alpha, arm, arm64,
armel, armhf, hppa, i386, ia64, mips, mips64, mips64el, mipsel, mipsn32, mipsn32el, powerpc,
ppc64, ppc64le, riscv64, s390, s390x, sh4, sparc, sparc64, v850, x32, x86_64, or none.

--os os
Set analysis operating system, which defaults to the host operating system. Use one of: freebsd,
linux, macos, solaris, or windows.

--i386-linux, --i386-win32, --i386-macos, --ppc-macos, --win32, --winx64
Set analysis architecture and operating system.

--LP64, --ILP64, --LLP64, --ILP32, --LP32
Set width of int, long and pointers, but don't override default architecture and operating system.

--16, --32, --64
Equivalent to --LP32, --ILP32, --LP64 (on Windows: --LLP64).

--little-endian
allow little-endian word-byte conversions

--big-endian
allow big-endian word-byte conversions

--gcc use GCC as preprocessor

Program representations:
--show-parse-tree
show parse tree

--show-symbol-table
show loaded symbol table

--show-goto-functions
show loaded goto program

--list-goto-functions
list loaded goto functions

--show-properties
show the properties, but don't run analysis

Program instrumentation options:
--property id
enable selected properties only

--bounds-check
enable array bounds checks

--pointer-check
enable pointer checks

--memory-leak-check
enable memory leak checks

--memory-cleanup-check
Enable memory cleanup checks: assert that all dynamically allocated memory is explicitly freed
before terminating the program.

--div-by-zero-check
enable division by zero checks

--signed-overflow-check
enable signed arithmetic over- and underflow checks

--unsigned-overflow-check
enable arithmetic over- and underflow checks

--pointer-overflow-check
enable pointer arithmetic over- and underflow checks

--conversion-check
check whether values can be represented after type cast

--undefined-shift-check
check shift greater than bit-width

--float-overflow-check
check floating-point for +/-Inf

--nan-check
check floating-point for NaN

--enum-range-check
checks that all enum type expressions have values in the enum range

--pointer-primitive-check
checks that all pointers in pointer primitives are valid or null

--retain-trivial-checks
include checks that are trivially true

--error-label label
check that label is unreachable

--no-built-in-assertions
ignore assertions in built-in library

--no-assertions
ignore user assertions

--no-assumptions
ignore user assumptions

--assert-to-assume
convert user assertions to assumptions

--malloc-may-fail
allow malloc calls to return a null pointer

--malloc-fail-assert
set malloc failure mode to assert-then-assume

--malloc-fail-null
set malloc failure mode to return null

--string-abstraction
track C string lengths and zero-termination

Other options:
--validate-goto-model
enable additional well-formedness checks on the goto program

--validate-ssa-equation
enable additional well-formedness checks on the SSA representation

--version
show version and exit

--flush
flush every line of output

--verbosity #
verbosity level

--timestamp [monotonic|wall]
Print microsecond-precision timestamps. monotonic: stamps increase monotonically. wall: ISO-8601
wall clock timestamps.

ENVIRONMENT

       All tools honor the TMPDIR environment variable when generating temporary files and directories.

BUGS

       If you encounter a problem please create an issue at https://github.com/diffblue/cbmc/issues

COPYRIGHT

       2017-2018, Daniel Kroening, Diffblue

goto-analyzer-5.59.0                                June 2022                                   GOTO-ANALYZER(1)