evcouplings.couplings package¶

evcouplings.couplings.mapping module¶

Mapping indices for complexes / multi-domain sequences to internal model numbering.

Authors:: Thomas A. Hopf Anna G. Green (MultiSegmentCouplingsModel)

class evcouplings.couplings.mapping.MultiSegmentCouplingsModel(filename, *segments, precision='float32', file_format='plmc_v2', **kwargs)[source]¶

Bases: evcouplings.couplings.model.CouplingsModel

Complex specific Couplings Model that handles segments and provides the option to convert model into inter-segment only.

to_inter_segment_model()[source]¶

Convert model to inter-segment only parameters, ie the J_ijs that correspond to inter-protein or inter-domain residue pairs. All other parameters are set to 0.

Returns:	Copy of object turned into inter-only Epistatic model
Return type:	CouplingsModel

class evcouplings.couplings.mapping.Segment(segment_type, sequence_id, region_start, region_end, positions=None, segment_id='A')[source]¶

Bases: object

Represents a continuous stretch of sequence in a sequence alignment to infer evolutionary couplings (e.g. multiple domains, or monomers in a concatenated complex alignment)

default_chain_name()[source]¶

Retrieve default PDB chain identifier the segment will be mapped to in 3D structures (by convention, segments in the pipeline are named A_1, A_2, …, B_1, B_2, …; the default chain identifier is anything before the underscore).

Returns:	chain – Default PDB chain identifier the segment maps to
Return type:	str

classmethod from_list(segment)[source]¶

Create a segment object from list representation (e.g. from config).

Parameters:	segment (list) – List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Returns:	New Segment instance from list
Return type:	Segment

to_list()[source]¶

Represent segment as list (for storing in configs)

Returns:	List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Return type:	list

class evcouplings.couplings.mapping.SegmentIndexMapper(focus_mode, first_index, *segments)[source]¶

Bases: object

Map indices of one or more sequence segments into CouplingsModel internal numbering space. Can also be used to (trivially) remap indices for a single sequence.

patch_model(model, inplace=True)[source]¶

Change numbering of CouplingModel object so that it uses segment-based numbering

Parameters:	model (CouplingsModel) – Model that will be updated to segment- based numbering inplace (bool, optional (default: True)) – If True, change passed model; otherwise returnnew object
Returns:	Model with updated numbering (if inplace is False, this will point to original model)
Return type:	CouplingsModel
Raises:	`ValueError` – If segment mapping does not match internal model numbering

to_model(x)[source]¶

Map target index to model index

Parameters:	x ((str, int), or list of (str, int)) – Indices in target indexing (segment_id, index_in_segment)
Returns:	Monomer indices mapped into couplings object numbering
Return type:	int, or list of int

to_target(x)[source]¶

Map model index to target index

Parameters:	x (int, or list of ints) – Indices in model numbering
Returns:	Indices mapped into target numbering. Tuples are (segment_id, index_in_segment)
Return type:	(str, int), or list of (str, int)

evcouplings.couplings.mapping.segment_map_ecs(ecs, mapper)[source]¶

Map EC dataframe in model numbering into segment numbering

Parameters:	ecs (pandas.DataFrame) – EC table (with columns i and j)
Returns:	Mapped EC table (with columns i and j mapped, and additional columns segment_i and segment_j)
Return type:	pandas.DataFrame

evcouplings.couplings.mean_field module¶

evcouplings.couplings.model module¶

Class to store parameters of undirected graphical model of sequences and perform calculations using the model (statistical energies, coupling scores).

Authors:: Thomas A. Hopf

class evcouplings.couplings.model.CouplingsModel(model_file, precision='float32', file_format='plmc_v2', **kwargs)[source]¶

Bases: object

Class to store parameters of pairwise undirected graphical model of sequences and compute evolutionary couplings, sequence statistical energies, etc.

Jij(i=None, j=None, A_i=None, A_j=None)[source]¶: Quick access to J_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

classmethod apc(matrix)[source]¶

Apply average product correction (Dunn et al., Bioinformatics, 2008) to matrix

Parameters:	matrix (np.array) – Symmetric L x L matrix which should be corrected by APC
Returns:	Symmetric L x L matrix with APC correction applied
Return type:	np.array

cn(i=None, j=None)[source]¶: Quick access to cn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

cn_scores¶: L x L numpy matrix with CN (corrected norm) scores

convert_sequences(sequences)[source]¶

Converts sequences in string format into internal symbol representation according to alphabet of model

Parameters:	sequences (list of str) – List of sequences (must have same length and correspond to model states)
Returns:	Matrix of size len(sequences) x L of sequences converted to integer symbols
Return type:	np.array

delta_hamiltonian(substitutions, verify_mutants=True)[source]¶

Calculate difference in statistical energy relative to self.target_seq by changing sequence according to list of substitutions

Parameters:	substitutions (list of tuple(pos, subs_from, subs_to)) – Substitutions to be applied to target sequence verify_mutants (bool, optional) – Test if subs_from is consistent with self.target_seq
Returns:	Vector of length 3 with 1) total delta Hamiltonian, 2) delta J_ij, 3) delta h_i
Return type:	np.array

dmm(i=None, j=None, A_i=None, A_j=None)[source]¶

Access delta_Hamiltonian matrix of double mutants of target sequence

Parameters:	i (Iterable(int) or int) – Position(s) of first substitution(s) j (Iterable(int) or int) – Position(s) of second substitution(s) A_i (Iterable(char) or char) – Substitution(s) to first position A_j (Iterable(char) or char) – Substitution(s) to second position
Returns:	4D matrix containing energy differences for slices along both axes of double mutation matrix (axes 1/2: position, axis 3/4: substitutions).
Return type:	np.array(float)

double_mut_mat¶

Hamiltonian difference for all possible double mutant variants

L x L x num_symbol x num_symbol matrix containing delta Hamiltonians for all possible double mutants of target sequence

ecs¶: DataFrame with evolutionary couplings, sorted by CN score (all scores: CN, FN, MI)

fi(i=None, A_i=None)[source]¶: Quick access to f_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

fij(i=None, j=None, A_i=None, A_j=None)[source]¶: Quick access to f_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

fn(i=None, j=None)[source]¶: Quick access to fn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

fn_scores¶: L x L numpy matrix with FN (Frobenius norm) scores

hamiltonians(sequences)[source]¶

Calculates the Hamiltonians of the global probability distribution P(A_1, …, A_L) for the given sequences A_1,…,A_L from J_ij and h_i parameters

Parameters:	sequences (list of str) – List of sequences for which Hamiltonian will be computed, or converted np.array obtained using convert_sequences method
Returns:	Float matrix of size len(sequences) x 3, where each row corresponds to the 1) total Hamiltonian of sequence and the 2) J_ij and 3) h_i sub-sums
Return type:	np.array

hi(i=None, A_i=None)[source]¶: Quick access to h_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

index_list¶: Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

itu(i=None)[source]¶: Legacy method for backwards compatibility. See self.sn for explanation.

mi_apc(i=None, j=None)[source]¶: Quick access to mi_scores_apc matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_raw(i=None, j=None)[source]¶: Quick access to mi_scores_raw matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_scores_apc¶: L x L numpy matrix with MI (mutual information) scores with APC correction

mi_scores_raw¶: L x L numpy matrix with MI (mutual information) scores without APC correction

mn(i=None)[source]¶

Map model numbering to internal numbering

Parameters:	i (Iterable(int) or int) – Position(s) to be mapped from model numbering space into internal numbering space
Returns:	Remapped position(s)
Return type:	Iterable(int) or int

mui(i=None)[source]¶: Legacy method for backwards compatibility. See self.mn for explanation.

seq(i=None)[source]¶

Access target sequence of model

Parameters:	i (Iterable(int) or int) – Position(s) for which symbol should be retrieved
Returns:	Sequence symbols
Return type:	Iterable(char) or char

single_mut_mat¶

Hamiltonian difference for all possible single-site variants

L x num_symbol matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence.

single_mut_mat_full¶

Hamiltonian difference for all possible single-site variants

L x num_symbol x 3 matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence. Third dimension: 1) full Hamiltonian, 2) J_ij, 3) h_i

smm(i=None, A_i=None)[source]¶

Access delta_Hamiltonian matrix of single mutants of target sequence

Parameters:	i (Iterable(int) or int) – Position(s) for which energy difference should be retrieved A_i (Iterable(char) or char) – Substitutions for which energy difference should be retrieved
Returns:	2D matrix containing energy differences for slices along both axes of single mutation matrix (first axis: position, second axis: substitution).
Return type:	np.array(float)

sn(i=None)[source]¶

Map internal numbering to sequence numbering

Parameters:	i (Iterable(int) or int) – Position(s) to be mapped from internal numbering space into sequence numbering space.
Returns:	Remapped position(s)
Return type:	Iterable(int) or int

target_seq¶: Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

to_file(out_file, precision='float32', file_format='plmc_v2')[source]¶

Writes the potentially modified model again to binary file

Parameters:	out_file (str) – A string specifying the path to a file precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file file_format ({"plmc_v1", "plmc_v2"}, optional (default: "plmc_v2")) – Available file formats

to_independent_model()[source]¶

Estimate parameters of a single-site model using Gaussian prior/L2 regularization.

Returns:	Copy of object turned into independent model
Return type:	CouplingsModel

evcouplings.couplings package¶

evcouplings.couplings.mapping module¶

evcouplings.couplings.mean_field module¶

evcouplings.couplings.model module¶

evcouplings.couplings.pairs module¶

evcouplings.couplings.protocol module¶

evcouplings.couplings.tools module¶