presto.Annotation¶
Annotation functions
-
presto.Annotation.
addHeader
(header, fields, values, delimiter=('|', '=', ', '))¶ Adds fields and values to a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – the list of fields to add or append to.
- values – the list of annotation values to add for each field.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary.
Return type:
-
presto.Annotation.
annotationConsensus
(seq_iter, field, delimiter=('|', '=', ', '))¶ Calculate a consensus annotation for a set of sequences
Parameters: - seq_iter – an iterator or list of SeqRecord objects
- field – the annotation field to take a consensus of
- delimiter – a tuple of delimiters for (annotations, field/values, value lists)
Returns: - Dictionary with keys
set containing a list of unique annotation values, count containing annotation counts, cons containing the consensus annotation, freq containing the majority annotation frequency
Return type:
-
presto.Annotation.
collapseAnnotation
(ann_dict, action, fields=None, delimiter=('|', '=', ', '))¶ Collapses multiple annotations into new single annotations for each field
Parameters: - ann_dict – dictionary of field/value pairs
- action – collapse action to take; one of {min, max, sum, first, last, set, cat}
- fields – subset of ann_dict to _collapse; if None, collapse all but the ID field
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified field dictionary
Return type: OrderedDict
-
presto.Annotation.
collapseHeader
(header, fields, actions, delimiter=('|', '=', ', '))¶ Collapses a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – the list of fields to collapse.
- actions – the list of collapse action take; one of (max, min, sum, first, last, set, cat) for each field.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary.
Return type:
-
presto.Annotation.
convert454Header
(desc)¶ Parses 454 headers into the pRESTO format
Parameters: desc (str) – a sequence description string. Returns: a dictionary of header field and value pairs. Return type: dict Examples
New style 454 header:
@<accession> <length=##> @GXGJ56Z01AE06X length=222
Old style 454 header:
@<rank_x_y> <length=##> <uaccno=accession> @000034_0199_0169 length=437 uaccno=GNDG01201ARRCR
-
presto.Annotation.
convertGenbankHeader
(desc, delimiter=('|', '=', ', '))¶ Converts GenBank and RefSeq headers into the pRESTO format
Parameters: Returns: a dictionary of header field and value pairs.
Return type: Examples
New style GenBank header:
<accession>.<version> <description> >CM000663.2 Homo sapiens chromosome 1, GRCh38 reference primary assembly
Old style GenBank header:
gi|<GI record number>|<dbsrc>|<accession>.<version>|<description> >gi|568336023|gb|CM000663.2| Homo sapiens chromosome 1, GRCh38 reference primary assembly
-
presto.Annotation.
convertGenericHeader
(desc, delimiter=('|', '=', ', '))¶ Converts any header to the pRESTO format
Parameters: Returns: a dictionary of header field and value pairs.
Return type:
-
presto.Annotation.
convertIMGTHeader
(desc, simple=False)¶ Converts germline headers from IMGT/GENE-DB into the pRESTO format
Parameters: Returns: a dictionary of header field and value pairs.
Return type: Examples
IMGT header:
>X60503|IGHV1-18*02|Homo sapiens|F|V-REGION|142..417|276 nt|1| | | | |276+24=300|partial in 3'| |
Header contains 15 fields separated by
|
(http://imgt.org/genedb):- IMGT/LIGM-DB accession number(s).
- Gene and allele name.
- Species.
- Functionality.
- Exon(s), region name(s), or extracted label(s).
- Start and end positions in the IMGT/LIGM-DB accession number(s).
- Number of nucleotides in the IMGT/LIGM-DB accession number(s).
- Codon start, or ‘NR’ (not relevant) for non coding labels and out-of-frame pseudogenes.
- Number of nucleotides added in
5'
compared to the corresponding label extracted from IMGT/LIGM-DB. - Number of nucleotides added or removed in
3'
compared to the corresponding label extracted from IMGT/LIGM-DB. - Number of added, deleted, and/or substituted nucleotides to correct sequencing errors, or ‘not corrected’ if non corrected sequencing errors.
- Number of amino acids (AA). This field indicates that the sequence is in amino acids.
- Number of characters in the sequence. Nucleotides (or AA) plus IMGT gaps.
- Partial (if it is).
- Reverse complementary (if it is).
-
presto.Annotation.
convertIlluminaHeader
(desc)¶ Converts Illumina headers into the pRESTO format
Parameters: desc (str) – a sequence description string. Returns: a dictionary of header field and value pairs. Return type: dict Examples
New style Illumina header:
@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read number>:<is filtered>:<control number>:<index sequence> @MISEQ:132:000000000-A2F3U:1:1101:14340:1555 2:N:0:ATCACG
Old style Illumina header:
@<instrument>:<flowcell lane>:<tile>:<x-pos>:<y-pos>#<index sequence>/<read number> @HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
-
presto.Annotation.
convertMIGECHeader
(desc)¶ Parses headers from the MIGEC tool into the pRESTO format
Parameters: desc (str) – a sequence description string. Returns: a dictionary of header field and value pairs. Return type: dict Examples
MIGEC header:
@MIG UMI:<UMI sequence>:<consensus read count> @MIG UMI:TCGGCCAACAAA:8
-
presto.Annotation.
convertSRAHeader
(desc)¶ Parses NCBI SRA or EMBL-EBI ENA headers into the pRESTO format
Parameters: desc (str) – a sequence description string. Returns: a dictionary of header field and value pairs. Return type: dict Examples
Header from
fastq-dump --split-files
:@<accession>.<spot> <original sequence description> <length=#> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 @SRR1383326.1 1 length=250
Header from
fastq-dump --split-files -I
:@<accession>.<spot>.<read number> <original sequence description> <length=#> @SRR1383326.1.1 1 length=250
Header from ENA:
@<accession>.<spot> <original sequence description> @ERR220397.1 HKSQ1MM01DXT2W/3 @ERR346596.1 BS-DSFCONTROL04:4:000000000-A3F0Y:1:1101:12758:1640/1 @ERR346596.1 BS-DSFCONTROL04:4:000000000-A3F0Y:1:1101:12758:1640/2
-
presto.Annotation.
copyHeader
(header, fields, names, actions=None, delimiter=('|', '=', ', '))¶ Copies fields in a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – a list of the field names to copy.
- names – a list of the new field names.
- actions – the list of collapse action take after the copy; one of (max, min, sum, first, last, set, cat) for each field.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary.
Return type:
-
presto.Annotation.
deleteHeader
(header, fields, delimiter=('|', '=', ', '))¶ Deletes fields from a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – the list of fields to delete.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary
Return type:
-
presto.Annotation.
expandHeader
(header, fields, separator=', ', delimiter=('|', '=', ', '))¶ Splits and annotation value into separate fields in a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – the field to split.
- separator – the delimiter to split the values by.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary.
Return type:
-
presto.Annotation.
flattenAnnotation
(ann_dict, delimiter=('|', '=', ', '))¶ Converts annotations from a dictionary to a FASTA/FASTQ sequence description
Parameters: - ann_dict – Dictionary of field/value pairs
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Formatted sequence description string
Return type:
-
presto.Annotation.
getAnnotationValues
(seq_iter, field, unique=False, delimiter=('|', '=', ', '))¶ Gets the set of unique annotation values in a sequence set
Parameters: - seq_iter – Iterator or list of SeqRecord objects
- field – Annotation field to retrieve values for
- unique – If True return a list of only the unique values; if False return a list of all values
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: List of values for the field
Return type:
-
presto.Annotation.
getCoordKey
(header, coord_type='presto', delimiter=('|', '=', ', '))¶ Return the coordinate identifier for a sequence description
Parameters: - header – Sequence header string
- coord_type – Sequence header format; one of [‘illumina’, ‘solexa’, ‘sra’, ‘454’, ‘presto’]; if unrecognized type or None return sequence ID.
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Coordinate identifier as a string
Return type:
-
presto.Annotation.
mergeAnnotation
(ann_dict_1, ann_dict_2, prepend=False, delimiter=('|', '=', ', '))¶ Merges non-ID field annotations from one field dictionary into another
Parameters: - ann_dict_1 – Dictionary of field/value pairs to append to
- ann_dict_2 – Dictionary of field/value pairs to merge with ann_dict_2
- prepend – If True then add ann_dict_2 values to the front of any ann_dict_1 values that are already present, rather than the default behavior of appending ann_dict_2 values.
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified ann_dict_1 dictonary of field/value pairs
Return type: OrderedDict
-
presto.Annotation.
mergeHeader
(header, fields, name, action=None, delete=False, delimiter=('|', '=', ', '))¶ Merges fields in a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – a list of the field names to merge.
- name – the name of the new field.
- delete – if True delete the merged fields.
- actions – the list of collapse action take after the merge one of (max, min, sum, first, last, set, cat).
- delimiter – a tuple of delimiters for (fields, values, value lists)
Returns: modified header dictionary.
Return type:
-
presto.Annotation.
parseAnnotation
(record, fields=None, delimiter=('|', '=', ', '))¶ Extracts annotations from a FASTA/FASTQ sequence description
Parameters: - record – Description string to extract annotations from
- fields – List of fields to subset the return dictionary to; if None return all fields
- delimiter – a tuple of delimiters for (fields, values, value lists)
Returns: An OrderedDict of field/value pairs
Return type: OrderedDict
-
presto.Annotation.
parseLog
(record)¶ Parses an pRESTO log record
Parameters: record (str) – a string of lines representing a log record including newline characters. Returns: parsed log contain field and values pairs as a dictionary. Return type: collections.OrderedDict
-
presto.Annotation.
renameAnnotation
(ann_dict, old_field, new_field, delimiter=('|', '=', ', '))¶ Renames an annotation and merges annotations if the new name already exists
Parameters: - ann_dict – Dictionary of field/value pairs
- old_field – Old field name
- new_field – New field name
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified fields dictonary
Return type: OrderedDict
-
presto.Annotation.
renameHeader
(header, fields, names, actions=None, delimiter=('|', '=', ', '))¶ Renames fields in a sequence header
Parameters: - header – an annotation dictionary returned by parseAnnotation.
- fields – a list of the current field names.
- names – a list of the new field names.
- actions – the list of collapse action take after the rename; one of (max, min, sum, first, last, set, cat) for each field.
- delimiter – a tuple of delimiters for (fields, values, value lists).
Returns: modified header dictionary.
Return type: