evcouplings.couplings package

evcouplings.couplings.mapping module

Mapping indices for complexes / multi-domain sequences to internal model numbering.

Authors:
Thomas A. Hopf Anna G. Green (MultiSegmentCouplingsModel)
class evcouplings.couplings.mapping.MultiSegmentCouplingsModel(filename, *segments, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: evcouplings.couplings.model.CouplingsModel

Complex specific Couplings Model that handles segments and provides the option to convert model into inter-segment only.

to_inter_segment_model()[source]

Convert model to inter-segment only parameters, ie the J_ijs that correspond to inter-protein or inter-domain residue pairs. All other parameters are set to 0.

Returns:Copy of object turned into inter-only Epistatic model
Return type:CouplingsModel
class evcouplings.couplings.mapping.Segment(segment_type, sequence_id, region_start, region_end, positions=None, segment_id='A')[source]

Bases: object

Represents a continuous stretch of sequence in a sequence alignment to infer evolutionary couplings (e.g. multiple domains, or monomers in a concatenated complex alignment)

default_chain_name()[source]

Retrieve default PDB chain identifier the segment will be mapped to in 3D structures (by convention, segments in the pipeline are named A_1, A_2, …, B_1, B_2, …; the default chain identifier is anything before the underscore).

Returns:chain – Default PDB chain identifier the segment maps to
Return type:str
classmethod from_list(segment)[source]

Create a segment object from list representation (e.g. from config).

Parameters:segment (list) – List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Returns:New Segment instance from list
Return type:Segment
to_list()[source]

Represent segment as list (for storing in configs)

Returns:List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Return type:list
class evcouplings.couplings.mapping.SegmentIndexMapper(focus_mode, first_index, *segments)[source]

Bases: object

Map indices of one or more sequence segments into CouplingsModel internal numbering space. Can also be used to (trivially) remap indices for a single sequence.

patch_model(model, inplace=True)[source]

Change numbering of CouplingModel object so that it uses segment-based numbering

Parameters:
  • model (CouplingsModel) – Model that will be updated to segment- based numbering
  • inplace (bool, optional (default: True)) – If True, change passed model; otherwise returnnew object
Returns:

Model with updated numbering (if inplace is False, this will point to original model)

Return type:

CouplingsModel

Raises:

ValueError – If segment mapping does not match internal model numbering

to_model(x)[source]

Map target index to model index

Parameters:x ((str, int), or list of (str, int)) – Indices in target indexing (segment_id, index_in_segment)
Returns:Monomer indices mapped into couplings object numbering
Return type:int, or list of int
to_target(x)[source]

Map model index to target index

Parameters:x (int, or list of ints) – Indices in model numbering
Returns:Indices mapped into target numbering. Tuples are (segment_id, index_in_segment)
Return type:(str, int), or list of (str, int)
evcouplings.couplings.mapping.segment_map_ecs(ecs, mapper)[source]

Map EC dataframe in model numbering into segment numbering

Parameters:ecs (pandas.DataFrame) – EC table (with columns i and j)
Returns:Mapped EC table (with columns i and j mapped, and additional columns segment_i and segment_j)
Return type:pandas.DataFrame

evcouplings.couplings.mean_field module

Inference of evolutionary couplings from sequence alignments using mean field approximation.

Authors:
Sophia F. Mersmann
class evcouplings.couplings.mean_field.MeanFieldCouplingsModel(alignment, index_list, regularized_f_i, regularized_f_ij, h_i, J_ij, theta, pseudo_count)[source]

Bases: evcouplings.couplings.model.CouplingsModel

Mean field DCA specific model class that stores the parameters inferred using mean field approximation and calculates mutual and direct information as well as fn and cn scores.

di_scores

L x L numpy matrix with DI (direct information) scores

regularize_f_i()[source]

Returns single-site frequencies regularized by a pseudo-count of symbols in alignment.

This method sets the attribute self.regularized_f_i and returns a reference to it.

Returns:Matrix of size L x num_symbols containing relative column frequencies of all symbols regularized by a pseudo-count.
Return type:np.array
regularize_f_ij()[source]

Returns pairwise frequencies regularized by a pseudo-count of symbols in alignment.

This method sets the attribute self.regularized_f_ij and returns a reference to it.

Returns:Matrix of size L x L x num_symbols x num_symbols containing relative pairwise frequencies of all symbols regularized by a pseudo-count.
Return type:np.array
tilde_fields(i, j)[source]

Compute h_tilde fields of the two-site model.

Parameters:
  • i (int) – First position.
  • j (int) – Second position.
Returns:

h_tilde fields of position i and j - both arrays of size 1 x num_symbols

Return type:

np.array, np.array

to_file(out_file, precision='float32', file_format='plmc_v2')[source]

Writes the model to binary file.

This method overrides its respective parent method in CouplingsModel.

Parameters:
  • out_file (str) – A string specifying the path to a file
  • precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file
  • file_format ({"plmc_v2"}, optional (default: "plmc_v2")) – For now, a mean-field model can only be written to a file in plmc_v2 format. Writing to a plmc_v1 file is not permitted since there is no functionality provided to read a mean-field model in plmc_v1 format.
to_independent_model()[source]

Compute a single-site model.

This method overrides its respective parent method in CouplingsModel.

Returns:Copy of object turned into independent model
Return type:MeanFieldCouplingsModel
to_raw_ec_file(couplings_file)[source]

Write mutual and direct information to the EC file.

Parameters:couplings_file (str) – Output path for file with evolutionary couplings.
transform_from_plmc_model()[source]

Adaptions that allow to read a mean-field couplings model from file where __read_plmc_v2 in CouplingsModel does the heavy lifting.

This includes: - Manage unused plmc-specific fields as well as the pseudo count field - Modify pair frequencies - Regularize column and pair frequencies (i.e. fill the fields regularized_f_i and regularized_f_ij)

class evcouplings.couplings.mean_field.MeanFieldDCA(alignment)[source]

Bases: object

Class that provides the functionality to infer evolutionary couplings from a given sequence alignment using mean field approximation.

Important: The input alignment should be an a2m alignment with lower / upper columns and the target sequence as the first record.

_raw_alignment

Alignment – The input alignment. This should be an a2m alignment with lower / upper columns and the target sequence as first record.

index_list

np.array – List of UniProt numbers of the target sequence (only upper case characters).

alignment

Alignment – A processed version of the given alignment (_raw_alignment) that is then used to infer evolutionary couplings using DCA.

N

int – The number of sequences (of the processed alignment).

L

int – The width of the alignment (again, this refers to the processed alignment).

num_symbols

int – The number of symbols of the alphabet used.

covariance_matrix

np.array – Matrix of size (L * (num_symbols-1)) x (L * (num_symbols-1)) containing the co-variation of each character pair in any positions.

covariance_matrix_inv

np.array – Inverse of covariance_matrix.

compute_covariance_matrix()[source]

Compute the covariance matrix.

This method sets the attribute self.covariance_matrix and returns a reference to it.

Returns:Reference to attribute self.convariance_matrix
Return type:np.array
fields()[source]

Compute fields.

Returns:Matrix of size L x num_symbols containing single-site fields.
Return type:np.array
fit(theta=0.8, pseudo_count=0.5)[source]

Run mean field direct couplings analysis.

Parameters:
  • theta (float, optional (default: 0.8)) – Sequences with pairwise identity >= theta will be clustered and their sequence weights downweighted as 1 / num_cluster_members.
  • pseudo_count (float, optional (default: 0.5)) – Applied to frequency counts to regularize in the case of insufficient data availability.
Returns:

Model object that holds the inferred fields (h_i) and couplings (J_ij).

Return type:

MeanFieldCouplingsModel

regularize_frequencies(pseudo_count=0.5)[source]

Returns single-site frequencies regularized by a pseudo-count of symbols in alignment.

This method sets the attribute self.regularized_frequencies and returns a reference to it.

Parameters:pseudo_count (float, optional (default: 0.5)) – The value to be added as pseudo-count.
Returns:Matrix of size L x num_symbols containing relative column frequencies of all symbols regularized by a pseudo-count.
Return type:np.array
regularize_pair_frequencies(pseudo_count=0.5)[source]

Add pseudo-count to pairwise frequencies to regularize in the case of insufficient data availability.

This method sets the attribute self.regularized_pair_frequencies and returns a reference to it.

Parameters:pseudo_count (float, optional (default: 0.5)) – The value to be added as pseudo-count.
Returns:Matrix of size L x L x num_symbols x num_symbols containing relative pairwise frequencies of all symbols regularized by a pseudo-count.
Return type:np.array
reshape_invC_to_4d()[source]

“Un-flatten” inverse of the covariance matrix to allow easy access to couplings using position and symbol indices.

Returns:Matrix of size L x L x num_symbols x num_symbols.
Return type:np.array
evcouplings.couplings.mean_field.regularize_frequencies(f_i, pseudo_count=0.5)[source]

Returns/calculates single-site frequencies regularized by a pseudo-count of symbols in alignment.

Parameters:
  • f_i (np.array) – Matrix of size L x num_symbols containing column frequencies.
  • pseudo_count (float, optional (default: 0.5)) – The value to be added as pseudo-count.
Returns:

Matrix of size L x num_symbols containing relative column frequencies of all symbols regularized by a pseudo-count.

Return type:

np.array

evcouplings.couplings.mean_field.regularize_pair_frequencies(f_ij, pseudo_count=0.5)[source]

Add pseudo-count to pairwise frequencies to regularize in the case of insufficient data availability.

Parameters:
  • f_ij (np.array) – Matrix of size L x L x num_symbols x num_symbols containing pair frequencies.
  • pseudo_count (float, optional (default: 0.5)) – The value to be added as pseudo-count.
Returns:

Matrix of size L x L x num_symbols x num_symbols containing relative pairwise frequencies of all symbols regularized by a pseudo-count.

Return type:

np.array

evcouplings.couplings.model module

Class to store parameters of undirected graphical model of sequences and perform calculations using the model (statistical energies, coupling scores).

Authors:
Thomas A. Hopf
class evcouplings.couplings.model.CouplingsModel(filename, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: object

Class to store parameters of pairwise undirected graphical model of sequences and compute evolutionary couplings, sequence statistical energies, etc.

Jij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to J_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

classmethod apc(matrix)[source]

Apply average product correction (Dunn et al., Bioinformatics, 2008) to matrix

Parameters:matrix (np.array) – Symmetric L x L matrix which should be corrected by APC
Returns:Symmetric L x L matrix with APC correction applied
Return type:np.array
cn(i=None, j=None)[source]

Quick access to cn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

cn_scores

L x L numpy matrix with CN (corrected norm) scores

convert_sequences(sequences)[source]

Converts sequences in string format into internal symbol representation according to alphabet of model

Parameters:sequences (list of str) – List of sequences (must have same length and correspond to model states)
Returns:Matrix of size len(sequences) x L of sequences converted to integer symbols
Return type:np.array
delta_hamiltonian(substitutions, verify_mutants=True)[source]

Calculate difference in statistical energy relative to self.target_seq by changing sequence according to list of substitutions

Parameters:
  • substitutions (list of tuple(pos, subs_from, subs_to)) – Substitutions to be applied to target sequence
  • verify_mutants (bool, optional) – Test if subs_from is consistent with self.target_seq
Returns:

Vector of length 3 with 1) total delta Hamiltonian, 2) delta J_ij, 3) delta h_i

Return type:

np.array

dmm(i=None, j=None, A_i=None, A_j=None)[source]

Access delta_Hamiltonian matrix of double mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) of first substitution(s)
  • j (Iterable(int) or int) – Position(s) of second substitution(s)
  • A_i (Iterable(char) or char) – Substitution(s) to first position
  • A_j (Iterable(char) or char) – Substitution(s) to second position
Returns:

4D matrix containing energy differences for slices along both axes of double mutation matrix (axes 1/2: position, axis 3/4: substitutions).

Return type:

np.array(float)

double_mut_mat

Hamiltonian difference for all possible double mutant variants

L x L x num_symbol x num_symbol matrix containing delta Hamiltonians for all possible double mutants of target sequence

ecs

DataFrame with evolutionary couplings, sorted by CN score (all scores: CN, FN, MI)

fi(i=None, A_i=None)[source]

Quick access to f_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

fij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to f_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

fn(i=None, j=None)[source]

Quick access to fn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

fn_scores

L x L numpy matrix with FN (Frobenius norm) scores

hamiltonians(sequences)[source]

Calculates the Hamiltonians of the global probability distribution P(A_1, …, A_L) for the given sequences A_1,…,A_L from J_ij and h_i parameters

Parameters:sequences (list of str) – List of sequences for which Hamiltonian will be computed, or converted np.array obtained using convert_sequences method
Returns:Float matrix of size len(sequences) x 3, where each row corresponds to the 1) total Hamiltonian of sequence and the 2) J_ij and 3) h_i sub-sums
Return type:np.array
hi(i=None, A_i=None)[source]

Quick access to h_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

index_list

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

itu(i=None)[source]

Legacy method for backwards compatibility. See self.sn for explanation.

mi_apc(i=None, j=None)[source]

Quick access to mi_scores_apc matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_raw(i=None, j=None)[source]

Quick access to mi_scores_raw matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_scores_apc

L x L numpy matrix with MI (mutual information) scores with APC correction

mi_scores_raw

L x L numpy matrix with MI (mutual information) scores without APC correction

mn(i=None)[source]

Map model numbering to internal numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from model numbering space into internal numbering space
Returns:Remapped position(s)
Return type:Iterable(int) or int
mui(i=None)[source]

Legacy method for backwards compatibility. See self.mn for explanation.

seq(i=None)[source]

Access target sequence of model

Parameters:i (Iterable(int) or int) – Position(s) for which symbol should be retrieved
Returns:Sequence symbols
Return type:Iterable(char) or char
single_mut_mat

Hamiltonian difference for all possible single-site variants

L x num_symbol matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence.

single_mut_mat_full

Hamiltonian difference for all possible single-site variants

L x num_symbol x 3 matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence. Third dimension: 1) full Hamiltonian, 2) J_ij, 3) h_i

smm(i=None, A_i=None)[source]

Access delta_Hamiltonian matrix of single mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) for which energy difference should be retrieved
  • A_i (Iterable(char) or char) – Substitutions for which energy difference should be retrieved
Returns:

2D matrix containing energy differences for slices along both axes of single mutation matrix (first axis: position, second axis: substitution).

Return type:

np.array(float)

sn(i=None)[source]

Map internal numbering to sequence numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from internal numbering space into sequence numbering space.
Returns:Remapped position(s)
Return type:Iterable(int) or int
target_seq

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

to_file(out_file, precision='float32', file_format='plmc_v2')[source]

Writes the potentially modified model again to binary file

Parameters:
  • out_file (str) – A string specifying the path to a file
  • precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file
  • file_format ({"plmc_v1", "plmc_v2"}, optional (default: "plmc_v2")) – Available file formats
to_independent_model()[source]

Estimate parameters of a single-site model using Gaussian prior/L2 regularization.

Returns:Copy of object turned into independent model
Return type:CouplingsModel

evcouplings.couplings.pairs module

Functions for handling evolutionary couplings data.

Todo

  1. clean up
  2. add Pompom score
  3. add mapping tools (multidomain, complexes)
  4. ECs to matrix
  5. APC on subsets of positions (e.g. for complexes)
Authors:
Thomas A. Hopf Agnes Toth-Petroczy (original mixture model code) John Ingraham (skew normal mixture model) Anna G. Green (EVComplex Score code)
class evcouplings.couplings.pairs.EVComplexScoreModel(x)[source]

Bases: object

Assign to each EC score a (unnormalized) EVcomplex score as described in Hopf, Schärfe et al. (2014).

TODO: this implementation currently does not take into account score normalization for the number of sequences and length of the model

probability(x, plot=False)[source]

Calculates evcomplex score as cn_score / min_cn_score. TODO: plotting functionality not yet implemented

Parameters:
  • x (np.array (or list-like)) – List of scores
  • plot (bool, optional (default: False)) – Plot score distribution
Returns:

probability – EVcomplex score

Return type:

np.array(float)

class evcouplings.couplings.pairs.LegacyScoreMixtureModel(x, clamp_mu=False, max_fun=10000, max_iter=1000)[source]

Bases: object

Assign to each EC score the probability of being in the lognormal tail of a normal-lognormal mixture model.

Note

this is the original version of the score mixture model with a normal distribution noise component, this has been superseded by a model using a skew normal distribution

probability(x, plot=False)[source]

Calculate posterior probability of EC pair to be located in positive (lognormal) tail of the distribution.

Parameters:
  • x (np.array (or list-like)) – List of scores
  • plot (bool, optional (default: False)) – Plot score distribution and probabilities
Returns:

posterior – Posterior probability of being in signal component of mixture model

Return type:

np.array(float)

class evcouplings.couplings.pairs.ScoreMixtureModel(x)[source]

Bases: object

Assign to each EC score the probability of being in the lognormal tail of a skew normal-lognormal mixture model.

classmethod lognorm_pdf(x, logmu, logsig)[source]

Probability density of lognormal distribution (signal component)

Parameters:
  • x (np.array(float)) – Data for which probability density should be calculated
  • logmu (float) – Location of lognormal distribution (signal component)
  • logsig (float) – Scale parameter of lognormal distribution (signal component)
Returns:

density – Probability density for input array x

Return type:

np.array(float)

classmethod mixture_pdf(x, p, scale, skew, logmu, logsig)[source]

Compute mixture probability

Parameters:
  • x (np.array(float)) – Data for which probability density should be calculated
  • p (float) – Mixing fraction between components for noise component (signal component will be 1-p)
  • scale (float) – Scale parameter of skew normal distribution (noise component)
  • skew (float) – Skew parameter of skew normal distribution (noise component)
  • logmu (float) – Location of lognormal distribution (signal component)
  • logsig (float) – Scale parameter of lognormal distribution (signal component)
Returns:

density – Probability density for input array x

Return type:

np.array(float)

classmethod posterior_signal(x, p, scale, skew, logmu, logsig)[source]

Compute posterior probability of being in signal component

Parameters:
  • x (np.array(float)) – Data for which probability density should be calculated
  • p (float) – Mixing fraction between components for noise component (signal component will be 1-p)
  • scale (float) – Scale parameter of skew normal distribution (noise component)
  • skew (float) – Skew parameter of skew normal distribution (noise component)
  • logmu (float) – Location of lognormal distribution (signal component)
  • logsig (float) – Scale parameter of lognormal distribution (signal component)
Returns:

posterior – Posterior probability of being in signal component for input array x

Return type:

np.array(float)

probability(x, plot=False)[source]

Calculate posterior probability of EC pair to be located in positive (lognormal) tail of the distribution.

Parameters:x (np.array (or list-like)) – List of scores
Returns:posterior – Posterior probability of being in signal component of mixture model
Return type:np.array(float)
classmethod skewnorm_constraint(scale, skew)[source]

Given scale and skew, returns location parameter to yield mean zero

Parameters:
  • scale (float) – Scale parameter of skew normal distribution
  • skew (float) – Skew parameter of skew normal distribution
Returns:

location – Location parameter of skew normal distribution s.t. mean of distribution is equal to 0

Return type:

float

classmethod skewnorm_pdf(x, location, scale, skew)[source]

Probability density of skew normal distribution (noise component)

Parameters:
  • x (np.array(float)) – Data for which probability density should be calculated
  • location (float) – Location parameter of skew normal distribution
  • scale (float) – Scale parameter of skew normal distribution
  • skew (float) – Skew parameter of skew normal distribution
Returns:

density – Probability density for input array x

Return type:

np.array(float)

evcouplings.couplings.pairs.add_mixture_probability(ecs, model='skewnormal', score='cn', clamp_mu=False, plot=False)[source]

Add lognormal mixture model probability to EC table.

Parameters:
  • ecs (pd.DataFrame) – EC table with scores
  • model ({"skewnormal", "normal"}, optional (default: skewnormal)) – Use model with skew-normal or normal distribution for the noise component of mixture model
  • score (str, optional (default: "cn")) – Score on which mixture model will be based
  • clamp_mu (bool, optional (default: False)) – Fix mean of Gaussian component to 0 instead of fitting it based on data
  • plot (bool, optional (default: False)) – Plot score distribution and probabilities
Returns:

ec_prob – EC table with additional column “probability” that for each EC contains the posterior probability of belonging to the lognormal tail of the distribution.

Return type:

pd.DataFrame

evcouplings.couplings.pairs.enrichment(ecs, num_pairs=1.0, score='cn', min_seqdist=6)[source]

Calculate EC “enrichment” as first described in Hopf et al., Cell, 2012.

Todo

Make this handle segments if they are in EC table

Parameters:
  • ecs (pd.DataFrame) – Dataframe containing couplings
  • num_pairs (int or float, optional (default: 1.0)) – Number of ECs to use for enrichment calculation. - If float, will be interpreted as fraction of the length of the sequence (e.g. 1.0*L) - If int, will be interpreted as absolute number of pairs
  • score (str, optional (default: cn)) – Pair coupling score used for calculation
  • min_seqdist (int, optional (default: 6)) – Minimum sequence distance of couplings that will be included in the calculation
Returns:

enrichment_table – Sorted table with enrichment values for each position in the sequence

Return type:

pd.DataFrame

evcouplings.couplings.pairs.read_raw_ec_file(filename, sort=True, score='cn')[source]

Read a raw EC file (e.g. from plmc) and sort by scores

Parameters:
  • filename (str) – File containing evolutionary couplings
  • sort (bool, optional (default: True)) – If True, sort pairs by coupling score in descending order
  • score_column (str, optional (default: True)) – Score column to be used for sorting
Returns:

ecs – Table of evolutionary couplings

Return type:

pd.DataFrame

evcouplings.couplings.protocol module

evcouplings.couplings.tools module

Wrappers for tools for calculation of evolutionary couplings from sequence alignments.

Authors:
Thomas A. Hopf
class evcouplings.couplings.tools.PlmcResult(couplings_file, param_file, iteration_table, focus_seq_index, num_valid_seqs, num_total_seqs, num_valid_sites, num_total_sites, region_start, effective_samples, optimization_status)

Bases: tuple

couplings_file

Alias for field number 0

effective_samples

Alias for field number 9

focus_seq_index

Alias for field number 3

iteration_table

Alias for field number 2

num_total_seqs

Alias for field number 5

num_total_sites

Alias for field number 7

num_valid_seqs

Alias for field number 4

num_valid_sites

Alias for field number 6

optimization_status

Alias for field number 10

param_file

Alias for field number 1

region_start

Alias for field number 8

evcouplings.couplings.tools.parse_plmc_log(log)[source]

Parse plmc stderr text output into structured data

Parameters:log (str) – stderr output from plmc
Returns:
  • iter_df (pd.DataFrame) – Table with iteration statistics
  • focus_index (int) – Index of focus sequence in alignment
  • valid_seqs (int) – Number of valid sequences in alignment
  • total_seqs (int) – Number of total sequences in alignment
  • valid_sites (int) – Analyzed number of sites in alignment/focus sequence
  • total_sites (int) – Total number of sites in alignment/focus sequence
  • region_start (int) – Index of first position in aligment
  • eff_samples (float) – Effective number of samples in alignment
  • opt_status (str) – End status of iterative optimization
evcouplings.couplings.tools.run_plmc(alignment, couplings_file, param_file=None, focus_seq=None, alphabet=None, theta=None, scale=None, ignore_gaps=False, iterations=None, lambda_h=None, lambda_J=None, lambda_g=None, cpu=None, binary='plmc')[source]

Run plmc on sequence alignment and store files with model parameters and pair couplings.

Parameters:
  • alignment (str) – Path to input sequence alignment
  • couplings_file (str) – Output path for file with evolutionary couplings (folder will be created)
  • param_file (str) – Output path for binary file containing model parameters (folder will be created)
  • focus_seq (str, optional (default: None)) – Name of focus sequence, if None, non-focus mode will be used
  • alphabet (str, optional (default: None)) – Alphabet for model inference. If None, standard amino acid alphabet including gap will be used. First character in string corresponds to gap character (relevant for ignore_gaps).
  • theta (float, optional (default: None)) – Sequences with pairwise identity >= theta will be clustered and their sequence weights downweighted as 1 / num_cluster_members. Important: Note that plmc will be parametrized using 1 - theta. If None, default value in plmc will be used, which corresponds to theta=0.8 (plmc setting 0.2).
  • scale (float, optional (default: None)) – Scale weights of clusters by this value. If None, default value in plmc (1.0) will be used
  • ignore_gaps (bool, optional (default: False)) – Exclude gaps from parameter inference. Gap character is first character of alphabet parameter.
  • iterations (int, optional (default: None)) – Maximum iterations for optimization.
  • lambda_h (float, optional (default: None)) – l2 regularization strength on fields. If None, plmc default will be used.
  • lambda_J (float, optional (default: None)) – l2-regularization strength on couplings. If None, plmc default will be used
  • lambda_g (float, optional (default: None)) – group l1-regularization strength on couplings If None, plmc default will be used.
  • cpu (Number of cores to use for running plmc.) – Note that plmc has to be compiled in openmp mode to runnable with multiple cores. Can also be set to “max”.
  • binary (str, optional (default: "plmc")) – Path to plmc binary
Returns:

namedtuple containing output files and parsed fields from console output of plmc

Return type:

PlmcResult

Raises:

ExternalToolError