Provided by: rsem_1.3.3+dfsg-3build1_amd64 bug

NAME

       rsem-generate-ngvector - Create Ng vector for EBSeq based only on transcript sequences.

SYNOPSIS

       rsem-generate-ngvector [options] input_fasta_file output_name

ARGUMENTS

       input_fasta_file
           The  fasta  file  containing  all reference transcripts. The transcripts must be in the same order as
           those   in   expression   value   files.   Thus,   'reference_name.transcripts.fa'    generated    by
           'rsem-prepare-reference' should be used.

       output_name
           The name of all output files. The Ng vector will be stored as 'output_name.ngvec'.

OPTIONS

       -k <int>
           k mer length. See description section. (Default: 25)

       -h/--help
           Show help information.

DESCRIPTION

       This program generates the Ng vector required by EBSeq for isoform level differential expression analysis
       based  on  reference  sequences  only.  EBSeq  can  take  variance  due  to  read  mapping ambiguity into
       consideration by grouping isoforms with parent gene's number of isoforms. However, for de novo  assembled
       transcriptome,  it  is hard to obtain an accurate gene-isoform relationship. Instead, this program groups
       isoforms by using measures on read mappaing ambiguity directly. First, it calculates the  'unmappability'
       of each transcript. The 'unmappability' of a transcript is the ratio between the number of k mers with at
       least  one  perfect match to other transcripts and the total number of k mers of this transcript, where k
       is a parameter. Then, Ng vector is generated by applying Kmeans algorithm to the  'unmappability'  values
       with number of clusters set as 3. 'rsem-generate-ngvector' will make sure the mean 'unmappability' scores
       for  clusters  are  in  ascending  order.  All  transcripts whose lengths are less than k are assigned to
       cluster 3.

       If your reference is a de novo assembled transcript set, you should run  'rsem-generate-ngvector'  first.
       Then load the resulting 'output_name.ngvec' into R. For example, you can use

        NgVec <- scan(file="output_name.ngvec", what=0, sep="\n")

       .  After  that, replace 'IsoNgTrun' with 'NgVec' in the second line of section 3.2.5 (Page 10) of EBSeq's
       vignette:

        IsoEBres=EBTest(Data=IsoMat, NgVector=NgVec, ...)

       This program only needs to run once per RSEM reference.

OUTPUT

       output_name.ump
           'unmappability' scores for each transcript. This file contains  two  columns.  The  first  column  is
           transcript name and the second column is 'unmappability' score.

       output_name.ngvec
           Ng vector generated by this program.

EXAMPLES

       Suppose  the  reference  sequences  file  is  '/ref/mouse_125/mouse_125.transcripts.fa'  and  we  set the
       output_name as 'mouse_125':

        rsem-generate-ngvector /ref/mouse_125/mouse_125.transcripts.fa mouse_125

perl v5.38.2                                       2024-04-14                          RSEM-GENERATE-NGVECTOR(1)