evcouplings.couplings package¶
evcouplings.couplings.mapping module¶
Mapping indices for complexes / multi-domain sequences to internal model numbering.
- Authors:
- Thomas A. Hopf Anna G. Green (MultiSegmentCouplingsModel)
-
class
evcouplings.couplings.mapping.
MultiSegmentCouplingsModel
(filename, *segments, precision='float32', file_format='plmc_v2', **kwargs)[source]¶ Bases:
evcouplings.couplings.model.CouplingsModel
Complex specific Couplings Model that handles segments and provides the option to convert model into inter-segment only.
-
to_inter_segment_model
()[source]¶ Convert model to inter-segment only parameters, ie the J_ijs that correspond to inter-protein or inter-domain residue pairs. All other parameters are set to 0.
Returns: Copy of object turned into inter-only Epistatic model Return type: CouplingsModel
-
-
class
evcouplings.couplings.mapping.
Segment
(segment_type, sequence_id, region_start, region_end, positions=None, segment_id='A')[source]¶ Bases:
object
Represents a continuous stretch of sequence in a sequence alignment to infer evolutionary couplings (e.g. multiple domains, or monomers in a concatenated complex alignment)
-
default_chain_name
()[source]¶ Retrieve default PDB chain identifier the segment will be mapped to in 3D structures (by convention, segments in the pipeline are named A_1, A_2, …, B_1, B_2, …; the default chain identifier is anything before the underscore).
Returns: chain – Default PDB chain identifier the segment maps to Return type: str
-
classmethod
from_list
(segment)[source]¶ Create a segment object from list representation (e.g. from config).
Parameters: segment (list) – List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int)) Returns: New Segment instance from list Return type: Segment
-
-
class
evcouplings.couplings.mapping.
SegmentIndexMapper
(focus_mode, first_index, *segments)[source]¶ Bases:
object
Map indices of one or more sequence segments into CouplingsModel internal numbering space. Can also be used to (trivially) remap indices for a single sequence.
-
patch_model
(model, inplace=True)[source]¶ Change numbering of CouplingModel object so that it uses segment-based numbering
Parameters: - model (CouplingsModel) – Model that will be updated to segment- based numbering
- inplace (bool, optional (default: True)) – If True, change passed model; otherwise returnnew object
Returns: Model with updated numbering (if inplace is False, this will point to original model)
Return type: Raises: ValueError
– If segment mapping does not match internal model numbering
-
-
evcouplings.couplings.mapping.
segment_map_ecs
(ecs, mapper)[source]¶ Map EC dataframe in model numbering into segment numbering
Parameters: ecs (pandas.DataFrame) – EC table (with columns i and j) Returns: Mapped EC table (with columns i and j mapped, and additional columns segment_i and segment_j) Return type: pandas.DataFrame
evcouplings.couplings.mean_field module¶
evcouplings.couplings.model module¶
Class to store parameters of undirected graphical model of sequences and perform calculations using the model (statistical energies, coupling scores).
- Authors:
- Thomas A. Hopf
-
class
evcouplings.couplings.model.
CouplingsModel
(model_file, precision='float32', file_format='plmc_v2', **kwargs)[source]¶ Bases:
object
Class to store parameters of pairwise undirected graphical model of sequences and compute evolutionary couplings, sequence statistical energies, etc.
-
Jij
(i=None, j=None, A_i=None, A_j=None)[source]¶ Quick access to J_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.
-
classmethod
apc
(matrix)[source]¶ Apply average product correction (Dunn et al., Bioinformatics, 2008) to matrix
Parameters: matrix (np.array) – Symmetric L x L matrix which should be corrected by APC Returns: Symmetric L x L matrix with APC correction applied Return type: np.array
-
cn
(i=None, j=None)[source]¶ Quick access to cn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.
-
cn_scores
¶ L x L numpy matrix with CN (corrected norm) scores
-
convert_sequences
(sequences)[source]¶ Converts sequences in string format into internal symbol representation according to alphabet of model
Parameters: sequences (list of str) – List of sequences (must have same length and correspond to model states) Returns: Matrix of size len(sequences) x L of sequences converted to integer symbols Return type: np.array
-
delta_hamiltonian
(substitutions, verify_mutants=True)[source]¶ Calculate difference in statistical energy relative to self.target_seq by changing sequence according to list of substitutions
Parameters: - substitutions (list of tuple(pos, subs_from, subs_to)) – Substitutions to be applied to target sequence
- verify_mutants (bool, optional) – Test if subs_from is consistent with self.target_seq
Returns: Vector of length 3 with 1) total delta Hamiltonian, 2) delta J_ij, 3) delta h_i
Return type: np.array
-
dmm
(i=None, j=None, A_i=None, A_j=None)[source]¶ Access delta_Hamiltonian matrix of double mutants of target sequence
Parameters: Returns: 4D matrix containing energy differences for slices along both axes of double mutation matrix (axes 1/2: position, axis 3/4: substitutions).
Return type: np.array(float)
-
double_mut_mat
¶ Hamiltonian difference for all possible double mutant variants
L x L x num_symbol x num_symbol matrix containing delta Hamiltonians for all possible double mutants of target sequence
-
ecs
¶ DataFrame with evolutionary couplings, sorted by CN score (all scores: CN, FN, MI)
-
fi
(i=None, A_i=None)[source]¶ Quick access to f_i matrix with automatic index mapping. See __2d_access for explanation of parameters.
-
fij
(i=None, j=None, A_i=None, A_j=None)[source]¶ Quick access to f_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.
-
fn
(i=None, j=None)[source]¶ Quick access to fn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.
-
fn_scores
¶ L x L numpy matrix with FN (Frobenius norm) scores
-
hamiltonians
(sequences)[source]¶ Calculates the Hamiltonians of the global probability distribution P(A_1, …, A_L) for the given sequences A_1,…,A_L from J_ij and h_i parameters
Parameters: sequences (list of str) – List of sequences for which Hamiltonian will be computed, or converted np.array obtained using convert_sequences method Returns: Float matrix of size len(sequences) x 3, where each row corresponds to the 1) total Hamiltonian of sequence and the 2) J_ij and 3) h_i sub-sums Return type: np.array
-
hi
(i=None, A_i=None)[source]¶ Quick access to h_i matrix with automatic index mapping. See __2d_access for explanation of parameters.
-
index_list
¶ Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)
-
mi_apc
(i=None, j=None)[source]¶ Quick access to mi_scores_apc matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.
-
mi_raw
(i=None, j=None)[source]¶ Quick access to mi_scores_raw matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.
-
mi_scores_apc
¶ L x L numpy matrix with MI (mutual information) scores with APC correction
-
mi_scores_raw
¶ L x L numpy matrix with MI (mutual information) scores without APC correction
-
mn
(i=None)[source]¶ Map model numbering to internal numbering
Parameters: i (Iterable(int) or int) – Position(s) to be mapped from model numbering space into internal numbering space Returns: Remapped position(s) Return type: Iterable(int) or int
-
seq
(i=None)[source]¶ Access target sequence of model
Parameters: i (Iterable(int) or int) – Position(s) for which symbol should be retrieved Returns: Sequence symbols Return type: Iterable(char) or char
-
single_mut_mat
¶ Hamiltonian difference for all possible single-site variants
L x num_symbol matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence.
-
single_mut_mat_full
¶ Hamiltonian difference for all possible single-site variants
L x num_symbol x 3 matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence. Third dimension: 1) full Hamiltonian, 2) J_ij, 3) h_i
-
smm
(i=None, A_i=None)[source]¶ Access delta_Hamiltonian matrix of single mutants of target sequence
Parameters: Returns: 2D matrix containing energy differences for slices along both axes of single mutation matrix (first axis: position, second axis: substitution).
Return type: np.array(float)
-
sn
(i=None)[source]¶ Map internal numbering to sequence numbering
Parameters: i (Iterable(int) or int) – Position(s) to be mapped from internal numbering space into sequence numbering space. Returns: Remapped position(s) Return type: Iterable(int) or int
-
target_seq
¶ Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)
-
to_file
(out_file, precision='float32', file_format='plmc_v2')[source]¶ Writes the potentially modified model again to binary file
Parameters: - out_file (str) – A string specifying the path to a file
- precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file
- file_format ({"plmc_v1", "plmc_v2"}, optional (default: "plmc_v2")) – Available file formats
-
to_independent_model
()[source]¶ Estimate parameters of a single-site model using Gaussian prior/L2 regularization.
Returns: Copy of object turned into independent model Return type: CouplingsModel
-