evcouplings.couplings package

evcouplings.couplings.mapping module

Mapping indices for complexes / multi-domain sequences to internal model numbering.

Authors:
Thomas A. Hopf Anna G. Green (MultiSegmentCouplingsModel)
class evcouplings.couplings.mapping.MultiSegmentCouplingsModel(filename, *segments, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: evcouplings.couplings.model.CouplingsModel

Complex specific Couplings Model that handles segments and provides the option to convert model into inter-segment only.

to_inter_segment_model()[source]

Convert model to inter-segment only parameters, ie the J_ijs that correspond to inter-protein or inter-domain residue pairs. All other parameters are set to 0.

Returns:Copy of object turned into inter-only Epistatic model
Return type:CouplingsModel
class evcouplings.couplings.mapping.Segment(segment_type, sequence_id, region_start, region_end, positions=None, segment_id='A')[source]

Bases: object

Represents a continuous stretch of sequence in a sequence alignment to infer evolutionary couplings (e.g. multiple domains, or monomers in a concatenated complex alignment)

default_chain_name()[source]

Retrieve default PDB chain identifier the segment will be mapped to in 3D structures (by convention, segments in the pipeline are named A_1, A_2, …, B_1, B_2, …; the default chain identifier is anything before the underscore).

Returns:chain – Default PDB chain identifier the segment maps to
Return type:str
classmethod from_list(segment)[source]

Create a segment object from list representation (e.g. from config).

Parameters:segment (list) – List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Returns:New Segment instance from list
Return type:Segment
to_list()[source]

Represent segment as list (for storing in configs)

Returns:List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Return type:list
class evcouplings.couplings.mapping.SegmentIndexMapper(focus_mode, first_index, *segments)[source]

Bases: object

Map indices of one or more sequence segments into CouplingsModel internal numbering space. Can also be used to (trivially) remap indices for a single sequence.

patch_model(model, inplace=True)[source]

Change numbering of CouplingModel object so that it uses segment-based numbering

Parameters:
  • model (CouplingsModel) – Model that will be updated to segment- based numbering
  • inplace (bool, optional (default: True)) – If True, change passed model; otherwise returnnew object
Returns:

Model with updated numbering (if inplace is False, this will point to original model)

Return type:

CouplingsModel

Raises:

ValueError – If segment mapping does not match internal model numbering

to_model(x)[source]

Map target index to model index

Parameters:x ((str, int), or list of (str, int)) – Indices in target indexing (segment_id, index_in_segment)
Returns:Monomer indices mapped into couplings object numbering
Return type:int, or list of int
to_target(x)[source]

Map model index to target index

Parameters:x (int, or list of ints) – Indices in model numbering
Returns:Indices mapped into target numbering. Tuples are (segment_id, index_in_segment)
Return type:(str, int), or list of (str, int)
evcouplings.couplings.mapping.segment_map_ecs(ecs, mapper)[source]

Map EC dataframe in model numbering into segment numbering

Parameters:ecs (pandas.DataFrame) – EC table (with columns i and j)
Returns:Mapped EC table (with columns i and j mapped, and additional columns segment_i and segment_j)
Return type:pandas.DataFrame

evcouplings.couplings.mean_field module

evcouplings.couplings.model module

Class to store parameters of undirected graphical model of sequences and perform calculations using the model (statistical energies, coupling scores).

Authors:
Thomas A. Hopf
class evcouplings.couplings.model.CouplingsModel(model_file, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: object

Class to store parameters of pairwise undirected graphical model of sequences and compute evolutionary couplings, sequence statistical energies, etc.

Jij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to J_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

classmethod apc(matrix)[source]

Apply average product correction (Dunn et al., Bioinformatics, 2008) to matrix

Parameters:matrix (np.array) – Symmetric L x L matrix which should be corrected by APC
Returns:Symmetric L x L matrix with APC correction applied
Return type:np.array
cn(i=None, j=None)[source]

Quick access to cn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

cn_scores

L x L numpy matrix with CN (corrected norm) scores

convert_sequences(sequences)[source]

Converts sequences in string format into internal symbol representation according to alphabet of model

Parameters:sequences (list of str) – List of sequences (must have same length and correspond to model states)
Returns:Matrix of size len(sequences) x L of sequences converted to integer symbols
Return type:np.array
delta_hamiltonian(substitutions, verify_mutants=True)[source]

Calculate difference in statistical energy relative to self.target_seq by changing sequence according to list of substitutions

Parameters:
  • substitutions (list of tuple(pos, subs_from, subs_to)) – Substitutions to be applied to target sequence
  • verify_mutants (bool, optional) – Test if subs_from is consistent with self.target_seq
Returns:

Vector of length 3 with 1) total delta Hamiltonian, 2) delta J_ij, 3) delta h_i

Return type:

np.array

dmm(i=None, j=None, A_i=None, A_j=None)[source]

Access delta_Hamiltonian matrix of double mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) of first substitution(s)
  • j (Iterable(int) or int) – Position(s) of second substitution(s)
  • A_i (Iterable(char) or char) – Substitution(s) to first position
  • A_j (Iterable(char) or char) – Substitution(s) to second position
Returns:

4D matrix containing energy differences for slices along both axes of double mutation matrix (axes 1/2: position, axis 3/4: substitutions).

Return type:

np.array(float)

double_mut_mat

Hamiltonian difference for all possible double mutant variants

L x L x num_symbol x num_symbol matrix containing delta Hamiltonians for all possible double mutants of target sequence

ecs

DataFrame with evolutionary couplings, sorted by CN score (all scores: CN, FN, MI)

fi(i=None, A_i=None)[source]

Quick access to f_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

fij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to f_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

fn(i=None, j=None)[source]

Quick access to fn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

fn_scores

L x L numpy matrix with FN (Frobenius norm) scores

hamiltonians(sequences)[source]

Calculates the Hamiltonians of the global probability distribution P(A_1, …, A_L) for the given sequences A_1,…,A_L from J_ij and h_i parameters

Parameters:sequences (list of str) – List of sequences for which Hamiltonian will be computed, or converted np.array obtained using convert_sequences method
Returns:Float matrix of size len(sequences) x 3, where each row corresponds to the 1) total Hamiltonian of sequence and the 2) J_ij and 3) h_i sub-sums
Return type:np.array
hi(i=None, A_i=None)[source]

Quick access to h_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

index_list

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

itu(i=None)[source]

Legacy method for backwards compatibility. See self.sn for explanation.

mi_apc(i=None, j=None)[source]

Quick access to mi_scores_apc matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_raw(i=None, j=None)[source]

Quick access to mi_scores_raw matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_scores_apc

L x L numpy matrix with MI (mutual information) scores with APC correction

mi_scores_raw

L x L numpy matrix with MI (mutual information) scores without APC correction

mn(i=None)[source]

Map model numbering to internal numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from model numbering space into internal numbering space
Returns:Remapped position(s)
Return type:Iterable(int) or int
mui(i=None)[source]

Legacy method for backwards compatibility. See self.mn for explanation.

seq(i=None)[source]

Access target sequence of model

Parameters:i (Iterable(int) or int) – Position(s) for which symbol should be retrieved
Returns:Sequence symbols
Return type:Iterable(char) or char
single_mut_mat

Hamiltonian difference for all possible single-site variants

L x num_symbol matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence.

single_mut_mat_full

Hamiltonian difference for all possible single-site variants

L x num_symbol x 3 matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence. Third dimension: 1) full Hamiltonian, 2) J_ij, 3) h_i

smm(i=None, A_i=None)[source]

Access delta_Hamiltonian matrix of single mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) for which energy difference should be retrieved
  • A_i (Iterable(char) or char) – Substitutions for which energy difference should be retrieved
Returns:

2D matrix containing energy differences for slices along both axes of single mutation matrix (first axis: position, second axis: substitution).

Return type:

np.array(float)

sn(i=None)[source]

Map internal numbering to sequence numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from internal numbering space into sequence numbering space.
Returns:Remapped position(s)
Return type:Iterable(int) or int
target_seq

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

to_file(out_file, precision='float32', file_format='plmc_v2')[source]

Writes the potentially modified model again to binary file

Parameters:
  • out_file (str) – A string specifying the path to a file
  • precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file
  • file_format ({"plmc_v1", "plmc_v2"}, optional (default: "plmc_v2")) – Available file formats
to_independent_model()[source]

Estimate parameters of a single-site model using Gaussian prior/L2 regularization.

Returns:Copy of object turned into independent model
Return type:CouplingsModel

evcouplings.couplings.pairs module

evcouplings.couplings.protocol module

evcouplings.couplings.tools module