evcouplings.complex package¶
evcouplings.complex.protocol module¶
Protocols for matching putatively interacting sequences in protein complexes to create a concatenated sequence alignment
- Authors:
- Anna G. Green Thomas A. Hopf
-
evcouplings.complex.protocol.
best_hit
(**kwargs)[source]¶ Protocol:
Concatenate alignments based on the best hit to the focus sequence in each species
Parameters: kwargs arguments (Mandatory) – See list below in code where calling check_required Returns: outcfg – Output configuration of the pipeline, including the following fields: alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file
Return type: dict
-
evcouplings.complex.protocol.
describe_concatenation
(annotation_file_1, annotation_file_2, genome_location_filename_1, genome_location_filename_2, outfile)[source]¶ Describes properties of concatenated alignment.
Writes a csv with the following columns
num_seqs_1 : number of sequences in the first monomer alignment num_seqs_2 : number of sequences in the second monomer alignment num_nonred_species_1 : number of unique species annotations in the
first monomer alignment- num_nonred_species_2 : number of unique species annotations in the
- second monomer alignment
num_species_overlap: number of unique species found in both alignments median_num_per_species_1 : median number of paralogs per species in the
first monomer alignmment- median_num_per_species_2 : median number of paralogs per species in
- the second monomer alignment
- num_with_embl_cds_1 : number of IDs for which we found an EMBL CDS in the
- first monomer alignment (relevant to distance concatention only)
- num_with_embl_cds_2 : number of IDs for which we found an EMBL CDS in the
- first monomer alignment (relevant to distance concatention only)
Parameters: - annotation_file_1 (str) – Path to annotation.csv file for first monomer alignment
- annotation_file_2 (str) – Path to annotation.csv file for second monomer alignment
- genome_location_filename_1 (str) – Path to genome location mapping file for first alignment
- genome_location_filename_2 (str) – Path to genome location mapping file for second alignment
- outfile (str) – Path to output file
-
evcouplings.complex.protocol.
genome_distance
(**kwargs)[source]¶ Protocol:
Concatenate alignments based on genomic distance
Parameters: kwargs arguments (Mandatory) – See list below in code where calling check_required Returns: outcfg – Output configuration of the pipeline, including the following fields: - alignment_file
- raw_alignment_file
- focus_mode
- focus_sequence
- segments
- frequencies_file
- identities_file
- num_sequences
- num_sites
- raw_focus_alignment_file
- statistics_file
Return type: dict
-
evcouplings.complex.protocol.
modify_complex_segments
(outcfg, **kwargs)[source]¶ Modifies the output configuration so that the segments are correct for a concatenated alignment
Parameters: outcfg (dict) – The output configuration Returns: outcfg – The output configuration, with a new field called “segments” Return type: dict
-
evcouplings.complex.protocol.
run
(**kwargs)[source]¶ Run alignment concatenation protocol
Parameters: kwargs arguments (Mandatory) – protocol: concatenation protocol to run prefix: Output prefix for all generated files Returns: outcfg – Output configuration of concatenation stage Dictionary with results in following fields: (in brackets: not mandatory) alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file
Return type: dict