trviz.utils
Module Contents
Functions
|
Read fasta file and output headers and sequences |
|
Return a counter for each motif |
|
Check if the given state is emitting state, that is insertion or matching state |
|
|
|
Check if the given sequence is DNA sequence |
|
Sort the aligned and encoded tandem repeats based on the given order |
|
Sort the aligned and encoded tandem repeats |
|
This function takes two strings and returns the Levenshtein distance between them. |
|
|
|
|
|
|
|
Stores the edit distance between a motif and another motif. |
|
|
|
|
|
|
|
This function takes a list of encoded traces as input and returns a list of padded traces. |
|
Call in a loop to create terminal progress bar |
|
Parse the region prediction file and store the result in a dictionary |
Attributes
- trviz.utils.LOWERCASE_LETTERS
- trviz.utils.UPPERCASE_LETTERS
- trviz.utils.DIGITS
- trviz.utils.skipping_characters = ['(', '=', '<', '>', '?', '-']
- trviz.utils.PRIVATE_MOTIF_LABEL = ?
- trviz.utils.INDEX_TO_CHR
- trviz.utils.DNA_CHARACTERS
- trviz.utils.get_sample_and_sequence_from_fasta(fasta_file)
Read fasta file and output headers and sequences
- trviz.utils.get_motif_counter(decomposed_vntrs)
Return a counter for each motif
- trviz.utils.is_emitting_state(state_name)
Check if the given state is emitting state, that is insertion or matching state
- trviz.utils.get_repeating_pattern_lengths(visited_states)
- trviz.utils.get_motifs_from_visited_states_and_region(visited_states, region)
- trviz.utils.is_valid_sequence(sequence)
Check if the given sequence is DNA sequence
- trviz.utils.sort_by_manually(aligned_vntrs, sample_ids, sample_order_file)
Sort the aligned and encoded tandem repeats based on the given order
- trviz.utils.sort(aligned_vntrs, sample_ids, symbol_to_motif, sample_order_file, method='motif_count')
Sort the aligned and encoded tandem repeats
- trviz.utils.get_levenshtein_distance(s1, s2)
This function takes two strings and returns the Levenshtein distance between them. The Levenshtein distance is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other. For example, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits: kitten → sitten (substitution of “s” for “k”) sitten → sittin (substitution of “i” for “e”)
- trviz.utils._calculate_cost(seq1, seq2, alphabet_to_motif)
- trviz.utils.calculate_cost_with_dist_matrix(aligned_encoded_vntr1, aligned_encoded_vntr2, dist_matrix, allow_copy_change=False)
- trviz.utils.calculate_cost(alinged_vntrs, alphabet_to_motif)
- trviz.utils.get_distance_matrix(symbol_to_motif, score=False)
Stores the edit distance between a motif and another motif. if two motifs are the same (e.g. dist_matrix[motif_x][motif_x]) it stores the length of the motif.
- Parameters
symbol_to_motif – a dictionary mapping symbols to motifs
score – if True, it outputs score matrix (1 - distance/max_dist)
- trviz.utils.get_score_matrix(symbol_to_motif, match_score=2, mismatch_score_for_edit_dist_of_1=-1, mismatch_score_for_edit_dist_greater_than_1=-2, gap_open_penalty=1.5, gap_extension_penalty=0.6)
- trviz.utils.calculate_total_cost(alinged_vntrs, dist_matrix)
- trviz.utils.sort_by_simulated_annealing_optimized(seq_list, sample_ids, symbol_to_motif)
- trviz.utils.add_padding(encoded_trs)
This function takes a list of encoded traces as input and returns a list of padded traces. The padding is done by adding ‘-’ to the end of each trace. The number of ‘-’ added to each trace is equal to the difference between the length of the longest trace and the length of the trace.
- trviz.utils.print_progress_bar(iteration, total, prefix='', suffix='', decimals=1, length=100, fill='█', print_end='\r')
Call in a loop to create terminal progress bar @params:
iteration - Required : current iteration (Int) total - Required : total iterations (Int) prefix - Optional : prefix string (Str) suffix - Optional : suffix string (Str) decimals - Optional : positive number of decimals in percent complete (Int) length - Optional : character length of bar (Int) fill - Optional : bar fill character (Str) printEnd - Optional : end character (e.g. “
“, ” “) (Str)
- trviz.utils.get_motif_marks(sample_ids: List[str], decomposed_trs: List[List[str]], region_prediction_file: str) Dict[str, str]
Parse the region prediction file and store the result in a dictionary The format of the file >sample_id region1,start,end region2,start,end
e.g.) >sample_1 match,2,4258 cds,2,2000 intron,2001,3554 cds,3555,4258
- Parameters
sample_ids –
decomposed_trs –
region_prediction_file –
- Returns