ClusterSets¶
Cluster sequences by group
usage: ClusterSets [--version] [-h] ...
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
- output files:
- cluster-pass
- clustered reads.
- cluster-fail
- raw reads failing clustering.
- output annotation fields:
- CLUSTER
- a numeric cluster identifier defining the within-group cluster.
ClusterSets all¶
Cluster all sequences regardless of annotation.
usage: ClusterSets all [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
[-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
[--outname OUT_NAME] [--fasta]
[--delim DELIMITER DELIMITER DELIMITER] [--nproc NPROC]
[-k CLUSTER_FIELD] [--ident IDENT]
[--length LENGTH_RATIO] [--prefix CLUSTER_PREFIX]
[--cluster {usearch,vsearch,cd-hit-est}]
[--exec CLUSTER_EXEC] [--start SEQ_START]
[--end SEQ_END]
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
-
-s
<seq_files>
¶ A list of FASTA/FASTQ files containing sequences to process.
-
-o
<out_files>
¶ Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).
-
--outdir
<out_dir>
¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname
<out_name>
¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--fasta
¶
Specify to force output as FASTA rather than FASTQ.
-
--delim
<delimiter>
¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--nproc
<nproc>
¶ The number of simultaneous computational processes to execute (CPU cores to utilized).
-
-k
<cluster_field>
¶ The name of the output annotation field to add with the cluster information for each sequence.
-
--ident
<ident>
¶ The sequence identity threshold to use for clustering. Note, how identity is calculated is specific to the clustering application used.
-
--length
<length_ratio>
¶ The minimum allowed shorter/longer sequence length ratio allowed within a cluster. Setting this value to 1.0 will require identical length matches within clusters. A value of 0.0 will allow clusters containing any length of substring.
-
--prefix
<cluster_prefix>
¶ A string to use as the prefix for each cluster identifier. By default, cluster identifiers will be numeric values only.
-
--cluster
{usearch,vsearch,cd-hit-est}
¶ The clustering tool to use for assigning clusters. Must be one of usearch, vsearch or cd-hit-est. Note, for cd-hit-est the maximum memory limit is set to 3GB.
-
--exec
<cluster_exec>
¶ The name or path of the usearch, vsearch or cd-hit-est executable.
-
--start
<seq_start>
¶ The start of the region to be used for clustering. Together with –end, this parameter can be used to specify a subsequence of each read to use in the clustering algorithm.
-
--end
<seq_end>
¶ The end of the region to be used for clustering.
ClusterSets barcode¶
Cluster reads by clustering barcode sequences.
usage: ClusterSets barcode [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
[-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
[--outname OUT_NAME] [--fasta]
[--delim DELIMITER DELIMITER DELIMITER]
[--nproc NPROC] [-k CLUSTER_FIELD] [--ident IDENT]
[--length LENGTH_RATIO] [--prefix CLUSTER_PREFIX]
[--cluster {usearch,vsearch,cd-hit-est}]
[--exec CLUSTER_EXEC] [-f BARCODE_FIELD]
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
-
-s
<seq_files>
¶ A list of FASTA/FASTQ files containing sequences to process.
-
-o
<out_files>
¶ Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).
-
--outdir
<out_dir>
¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname
<out_name>
¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--fasta
¶
Specify to force output as FASTA rather than FASTQ.
-
--delim
<delimiter>
¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--nproc
<nproc>
¶ The number of simultaneous computational processes to execute (CPU cores to utilized).
-
-k
<cluster_field>
¶ The name of the output annotation field to add with the cluster information for each sequence.
-
--ident
<ident>
¶ The sequence identity threshold to use for clustering. Note, how identity is calculated is specific to the clustering application used.
-
--length
<length_ratio>
¶ The minimum allowed shorter/longer sequence length ratio allowed within a cluster. Setting this value to 1.0 will require identical length matches within clusters. A value of 0.0 will allow clusters containing any length of substring.
-
--prefix
<cluster_prefix>
¶ A string to use as the prefix for each cluster identifier. By default, cluster identifiers will be numeric values only.
-
--cluster
{usearch,vsearch,cd-hit-est}
¶ The clustering tool to use for assigning clusters. Must be one of usearch, vsearch or cd-hit-est. Note, for cd-hit-est the maximum memory limit is set to 3GB.
-
--exec
<cluster_exec>
¶ The name or path of the usearch, vsearch or cd-hit-est executable.
-
-f
<barcode_field>
¶ The annotation field containing barcode sequences.
ClusterSets set¶
Cluster sequences within annotation sets.
usage: ClusterSets set [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]
[-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
[--outname OUT_NAME] [--log LOG_FILE] [--failed]
[--fasta] [--delim DELIMITER DELIMITER DELIMITER]
[--nproc NPROC] [-k CLUSTER_FIELD] [--ident IDENT]
[--length LENGTH_RATIO] [--prefix CLUSTER_PREFIX]
[--cluster {usearch,vsearch,cd-hit-est}]
[--exec CLUSTER_EXEC] [-f SET_FIELD]
[--start SEQ_START] [--end SEQ_END]
-
--version
¶
show program’s version number and exit
-
-h
,
--help
¶
show this help message and exit
-
-s
<seq_files>
¶ A list of FASTA/FASTQ files containing sequences to process.
-
-o
<out_files>
¶ Explicit output file name(s). Note, this argument cannot be used with the –failed, –outdir, or –outname arguments. If unspecified, then the output filename will be based on the input filename(s).
-
--outdir
<out_dir>
¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname
<out_name>
¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--log
<log_file>
¶ Specify to write verbose logging to a file. May not be specified with multiple input files.
-
--failed
¶
If specified create files containing records that fail processing.
-
--fasta
¶
Specify to force output as FASTA rather than FASTQ.
-
--delim
<delimiter>
¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--nproc
<nproc>
¶ The number of simultaneous computational processes to execute (CPU cores to utilized).
-
-k
<cluster_field>
¶ The name of the output annotation field to add with the cluster information for each sequence.
-
--ident
<ident>
¶ The sequence identity threshold to use for clustering. Note, how identity is calculated is specific to the clustering application used.
-
--length
<length_ratio>
¶ The minimum allowed shorter/longer sequence length ratio allowed within a cluster. Setting this value to 1.0 will require identical length matches within clusters. A value of 0.0 will allow clusters containing any length of substring.
-
--prefix
<cluster_prefix>
¶ A string to use as the prefix for each cluster identifier. By default, cluster identifiers will be numeric values only.
-
--cluster
{usearch,vsearch,cd-hit-est}
¶ The clustering tool to use for assigning clusters. Must be one of usearch, vsearch or cd-hit-est. Note, for cd-hit-est the maximum memory limit is set to 3GB.
-
--exec
<cluster_exec>
¶ The name or path of the usearch, vsearch or cd-hit-est executable.
-
-f
<set_field>
¶ The annotation field containing annotations, such as UMI barcode, for sequence grouping.
-
--start
<seq_start>
¶ The start of the region to be used for clustering. Together with –end, this parameter can be used to specify a subsequence of each read to use in the clustering algorithm.
-
--end
<seq_end>
¶ The end of the region to be used for clustering.