Provided by: futhark_0.25.15-2build1_amd64 bug

NAME

       futhark-cuda - compile Futhark to CUDA

SYNOPSIS

       futhark cuda [options…] <program.fut>

DESCRIPTION

       futhark  cuda  translates  a  Futhark program to C code invoking CUDA kernels, and either compiles that C
       code with a C compiler to an executable binary program, or produces a .h and .c file that can  be  linked
       with other code. The standard Futhark optimisation pipeline is used.

       futhark cuda uses -lcuda -lcudart -lnvrtc to link.  If using --library, you will need to do the same when
       linking the final binary.

       The  generated CUDA code can be called from multiple CPU threads, as it brackets every API operation with
       cuCtxPushCurrent() and cuCtxPopCurrent().

OPTIONS

       Accepts the same options as futhark-c.

ENVIRONMENT VARIABLES

       CC
          The C compiler used to compile the program.  Defaults to cc if unset.

       CFLAGS
          Space-separated list of options passed to the C compiler.  Defaults to -O -std=c99 if unset.

EXECUTABLE OPTIONS

       Generated executables accept the same options as those generated by futhark-c. The -t option  behaves  as
       with futhark-opencl.

       The following additional options are accepted.

       -h, --help
              Print help text to standard output and exit.

       --default-thread-block-size=INT
              The default size of thread blocks that are launched.  Capped to the hardware limit if necessary.

       --default-num-thread-blocks=INT
              The default number of thread blocks that are launched.

       --default-threshold=INT
              The  default  parallelism  threshold  used  for  comparisons  when selecting between code versions
              generated by incremental flattening.  Intuitively, the amount of parallelism  needed  to  saturate
              the GPU.

       --default-tile-size=INT
              The  default tile size used when performing two-dimensional tiling (the workgroup size will be the
              square of the tile size).

       --dump-cuda=FILE
              Don’t run the program, but instead dump the embedded CUDA kernels to the indicated  file.   Useful
              if you want to see what is actually being executed.

       --dump-ptx=FILE
              Don’t  run  the  program, but instead dump the PTX-compiled version of the embedded kernels to the
              indicated file.

       --load-cuda=FILE
              Instead of using the embedded CUDA kernels, load them from the indicated file.

       --load-ptx=FILE
              Load PTX code from the indicated file.

       --nvrtc-option=OPT
              Add an additional build option to the string passed to NVRTC.  Refer to the CUDA documentation for
              which options are supported.  Be careful - some options can easily result in invalid results.

ENVIRONMENT

       If run without --library, futhark cuda will invoke a C compiler to compile the generated C program into a
       binary.  This only works if the C compiler can find the necessary CUDA libraries.  On most systems,  CUDA
       is  installed  in /usr/local/cuda, which is usually not part of the default compiler search path. You may
       need to set the following environment variables before running futhark cuda:

          LIBRARY_PATH=/usr/local/cuda/lib64
          LD_LIBRARY_PATH=/usr/local/cuda/lib64/
          CPATH=/usr/local/cuda/include

       At runtime the generated program must be able to find the CUDA installation directory, which is  normally
       located  at  /usr/local/cuda.  If you have CUDA installed elsewhere, set any of the CUDA_HOME, CUDA_ROOT,
       or CUDA_PATH environment variables to the proper directory.

SEE ALSO

       futhark-opencl

COPYRIGHT

       2013-2024, DIKU, University of Copenhagen

0.25.15                                           May 15, 2024                                   FUTHARK-CUDA(1)