trviz.motif_encoder

Module Contents

Classes

MotifEncoder

class trviz.motif_encoder.MotifEncoder(private_motif_threshold=0)
static _divide_motifs_into_normal_and_private(motif_counter, private_motif_threshold)

Givne a list of decomposed VNTRs, divide motifs into two groups: normal and private. If a motif occurred less than the private_motif_threshold, it is regarded as private motif. Otherwise, normal motifs.

:param decomposed_vntrs :param private_motif_threshold :return: normal motifs, private motifs

static find_private_motif_threshold(decomposed_vntrs, label_count=None)

Find the frequency threshold for private motifs.

Parameters
  • decomposed_vntrs – decomposed tandem repeat sequences

  • label_count – if label_count is given, use only label_count number of characters to encode the motifs

Return min_private_motif_threshold

the frequency threshold for private motifs.

static write_motif_map(output_file, motif_to_alphabet, motif_counter)

Write the mapping motif to characters to the specified file

static _encode_decomposed_tr(decomposed_vntrs, motif_to_symbol)
encode(decomposed_vntrs: List[List], motif_map_file: str, label_count: int = None, auto: bool = True) List[str]

Encode decomposed tandem repeat sequences using ASCII characters. By default, the map between motifs and characters are written as a file.

Parameters
  • decomposed_vntrs – a list of decomposed tandem repeat sequences

  • motif_map_file – the output file name for the mapping between motifs and characters

  • label_count – the number of label (encoding) to represent the motifs.

  • auto – if True, find the minimum threshold to encode everything using 90 ASCII characters.

Returns

encoded_vntrs