evcouplings.complex package¶

evcouplings.complex.protocol module¶

Protocols for matching putatively interacting sequences in protein complexes to create a concatenated sequence alignment

Authors:: Anna G. Green Thomas A. Hopf

evcouplings.complex.protocol.best_hit(**kwargs)[source]¶

Protocol:

Concatenate alignments based on the best hit to the focus sequence in each species

Parameters:	kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:	outcfg – Output configuration of the pipeline, including the following fields: alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file
Return type:	dict

evcouplings.complex.protocol.describe_concatenation(annotation_file_1, annotation_file_2, genome_location_filename_1, genome_location_filename_2, outfile)[source]¶

Describes properties of concatenated alignment.

Writes a csv with the following columns

num_seqs_1 : number of sequences in the first monomer alignment num_seqs_2 : number of sequences in the second monomer alignment num_nonred_species_1 : number of unique species annotations in the

first monomer alignment

num_nonred_species_2 : number of unique species annotations in the: second monomer alignment

num_species_overlap: number of unique species found in both alignments median_num_per_species_1 : median number of paralogs per species in the

first monomer alignmment

median_num_per_species_2 : median number of paralogs per species in: the second monomer alignment
num_with_embl_cds_1 : number of IDs for which we found an EMBL CDS in the: first monomer alignment (relevant to distance concatention only)
num_with_embl_cds_2 : number of IDs for which we found an EMBL CDS in the: first monomer alignment (relevant to distance concatention only)

Parameters:

annotation_file_1 (str) – Path to annotation.csv file for first monomer alignment
annotation_file_2 (str) – Path to annotation.csv file for second monomer alignment
genome_location_filename_1 (str) – Path to genome location mapping file for first alignment
genome_location_filename_2 (str) – Path to genome location mapping file for second alignment
outfile (str) – Path to output file

evcouplings.complex.protocol.genome_distance(**kwargs)[source]¶

Protocol:

Concatenate alignments based on genomic distance

Parameters:	kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:	outcfg – Output configuration of the pipeline, including the following fields: alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file
Return type:	dict

evcouplings.complex.protocol.modify_complex_segments(outcfg, **kwargs)[source]¶

Modifies the output configuration so that the segments are correct for a concatenated alignment

Parameters:	outcfg (dict) – The output configuration
Returns:	outcfg – The output configuration, with a new field called “segments”
Return type:	dict

evcouplings.complex.protocol.run(**kwargs)[source]¶

Run alignment concatenation protocol

Parameters:	kwargs arguments (Mandatory) – protocol: concatenation protocol to run prefix: Output prefix for all generated files
Returns:	outcfg – Output configuration of concatenation stage Dictionary with results in following fields: (in brackets: not mandatory) alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file
Return type:	dict