trviz.utils

Module Contents

Functions

get_sample_and_sequence_from_fasta(fasta_file)

Read fasta file and output headers and sequences

get_motif_counter(decomposed_vntrs)

Return a counter for each motif

is_emitting_state(state_name)

Check if the given state is emitting state, that is insertion or matching state

get_repeating_pattern_lengths(visited_states)

get_motifs_from_visited_states_and_region(...)

is_valid_sequence(sequence)

Check if the given sequence is DNA sequence

sort_by_manually(aligned_vntrs, sample_ids, ...)

Sort the aligned and encoded tandem repeats based on the given order

sort(aligned_vntrs, sample_ids, symbol_to_motif, ...)

Sort the aligned and encoded tandem repeats

get_levenshtein_distance(s1, s2)

This function takes two strings and returns the Levenshtein distance between them.

_calculate_cost(seq1, seq2, alphabet_to_motif)

calculate_cost_with_dist_matrix(aligned_encoded_vntr1, ...)

calculate_cost(alinged_vntrs, alphabet_to_motif)

get_distance_matrix(symbol_to_motif[, score])

Stores the edit distance between a motif and another motif.

get_score_matrix(symbol_to_motif[, match_score, ...])

calculate_total_cost(alinged_vntrs, dist_matrix)

sort_by_simulated_annealing_optimized(seq_list, ...)

add_padding(encoded_trs)

This function takes a list of encoded traces as input and returns a list of padded traces.

print_progress_bar(iteration, total[, prefix, suffix, ...])

Call in a loop to create terminal progress bar

get_motif_marks(→ Dict[str, str])

Parse the region prediction file and store the result in a dictionary

Attributes

LOWERCASE_LETTERS

UPPERCASE_LETTERS

DIGITS

skipping_characters

PRIVATE_MOTIF_LABEL

INDEX_TO_CHR

DNA_CHARACTERS

trviz.utils.LOWERCASE_LETTERS
trviz.utils.UPPERCASE_LETTERS
trviz.utils.DIGITS
trviz.utils.skipping_characters = ['(', '=', '<', '>', '?', '-']
trviz.utils.PRIVATE_MOTIF_LABEL = ?
trviz.utils.INDEX_TO_CHR
trviz.utils.DNA_CHARACTERS
trviz.utils.get_sample_and_sequence_from_fasta(fasta_file)

Read fasta file and output headers and sequences

trviz.utils.get_motif_counter(decomposed_vntrs)

Return a counter for each motif

trviz.utils.is_emitting_state(state_name)

Check if the given state is emitting state, that is insertion or matching state

trviz.utils.get_repeating_pattern_lengths(visited_states)
trviz.utils.get_motifs_from_visited_states_and_region(visited_states, region)
trviz.utils.is_valid_sequence(sequence)

Check if the given sequence is DNA sequence

trviz.utils.sort_by_manually(aligned_vntrs, sample_ids, sample_order_file)

Sort the aligned and encoded tandem repeats based on the given order

trviz.utils.sort(aligned_vntrs, sample_ids, symbol_to_motif, sample_order_file, method='motif_count')

Sort the aligned and encoded tandem repeats

trviz.utils.get_levenshtein_distance(s1, s2)

This function takes two strings and returns the Levenshtein distance between them. The Levenshtein distance is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other. For example, the Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits: kitten → sitten (substitution of “s” for “k”) sitten → sittin (substitution of “i” for “e”)

trviz.utils._calculate_cost(seq1, seq2, alphabet_to_motif)
trviz.utils.calculate_cost_with_dist_matrix(aligned_encoded_vntr1, aligned_encoded_vntr2, dist_matrix, allow_copy_change=False)
trviz.utils.calculate_cost(alinged_vntrs, alphabet_to_motif)
trviz.utils.get_distance_matrix(symbol_to_motif, score=False)

Stores the edit distance between a motif and another motif. if two motifs are the same (e.g. dist_matrix[motif_x][motif_x]) it stores the length of the motif.

Parameters
  • symbol_to_motif – a dictionary mapping symbols to motifs

  • score – if True, it outputs score matrix (1 - distance/max_dist)

trviz.utils.get_score_matrix(symbol_to_motif, match_score=2, mismatch_score_for_edit_dist_of_1=-1, mismatch_score_for_edit_dist_greater_than_1=-2, gap_open_penalty=1.5, gap_extension_penalty=0.6)
trviz.utils.calculate_total_cost(alinged_vntrs, dist_matrix)
trviz.utils.sort_by_simulated_annealing_optimized(seq_list, sample_ids, symbol_to_motif)
trviz.utils.add_padding(encoded_trs)

This function takes a list of encoded traces as input and returns a list of padded traces. The padding is done by adding ‘-’ to the end of each trace. The number of ‘-’ added to each trace is equal to the difference between the length of the longest trace and the length of the trace.

trviz.utils.print_progress_bar(iteration, total, prefix='', suffix='', decimals=1, length=100, fill='█', print_end='\r')

Call in a loop to create terminal progress bar @params:

iteration - Required : current iteration (Int) total - Required : total iterations (Int) prefix - Optional : prefix string (Str) suffix - Optional : suffix string (Str) decimals - Optional : positive number of decimals in percent complete (Int) length - Optional : character length of bar (Int) fill - Optional : bar fill character (Str) printEnd - Optional : end character (e.g. “

“, ” “) (Str)

trviz.utils.get_motif_marks(sample_ids: List[str], decomposed_trs: List[List[str]], region_prediction_file: str) Dict[str, str]

Parse the region prediction file and store the result in a dictionary The format of the file >sample_id region1,start,end region2,start,end

e.g.) >sample_1 match,2,4258 cds,2,2000 intron,2001,3554 cds,3555,4258

Parameters
  • sample_ids

  • decomposed_trs

  • region_prediction_file

Returns