evcouplings.mutate package

evcouplings.mutate.calculations module

High-level mutation calculation functions for EVmutation

Todo

implement segment handling

Authors:
Thomas A. Hopf Anna G. Green (generalization for multiple segments)
evcouplings.mutate.calculations.extract_mutations(mutation_string, offset=0, sep=', ')[source]

Turns a string containing mutations of the format I100V into a list of tuples with format (100, ‘I’, ‘V’) (index, from, to)

Parameters:
  • mutation_string (str) – Comma-separated list of one or more mutations (e.g. “K50R,I100V”)
  • offset (int, default: 0) – Offset to be added to the index/position of each mutation
  • sep (str, default ",") – String used to separate multiple mutations
Returns:

List of tuples of the form (index+offset, from, to)

Return type:

list of tuples

evcouplings.mutate.calculations.predict_mutation_table(model, table, output_column='prediction_epistatic', mutant_column='mutant', hamiltonian='full', segment=None)[source]

Predicts all mutants in a dataframe and adds predictions as a new column.

If mutant_column is None, the dataframe index is used, otherwise the given column.

Mutations which cannot be calculated (e.g. not covered by alignment, or invalid substitution) using object are set to NaN.

Parameters:
  • model (CouplingsModel) – CouplingsModel instance used to compute mutation effects
  • table (pandas.DataFrame) – DataFrame with mutants to which delta of statistical energy will be added
  • mutant_column (str) – Name of column in table that contains mutants
  • output_column (str) – Name of column in returned dataframe that will contain computed effects
  • hamiltonian ({"full", "couplings", "fields"},) – default: “full” Use full Hamiltonian of exponential model (default), or only couplings / fields for statistical energy calculation.
  • segment (str, default: None) – Specificy a segment identifier to use for the positions in the mutation table. This will only be used if the mutation table doesn’t already have a segments column.
Returns:

Dataframe with added column (mutant_column) that contains computed mutation effects

Return type:

pandas.DataFrame

evcouplings.mutate.calculations.single_mutant_matrix(model, output_column='prediction_epistatic', exclude_self_subs=True)[source]

Create table with all possible single substitutions of target sequence in CouplingsModel object.

Parameters:
  • model (CouplingsModel) – Model that will be used to predict single mutants
  • output_column (str, default: "prediction_epistatic") – Name of column in Dataframe that will contain predictions
  • exclude_self_subs (bool, default: True) – Exclude self-substitutions (e.g. A100A) from results
Returns:

DataFrame with predictions for all single mutants

Return type:

pandas.DataFrame

evcouplings.mutate.calculations.split_mutants(x, mutant_column='mutant')[source]

Splits mutation strings into individual columns in DataFrame (wild-type symbol(s), position(s), substitution(s), number of mutations). This function is e.g. helpful when computing average effects per position using pandas groupby() operations

Parameters:
  • x (pandas.DataFrame) – Table with mutants
  • mutant_column (str, default: "mutant") – Column which contains mutants, set to None to use index of DataFrame
Returns:

DataFrame with added columns “num_subs”, “pos”, “wt” and “subs” that contain the number of mutations, and split mutation strings (if higher-order mutations, symbols/numbers are comma-separated)

Return type:

pandas.DataFrame

evcouplings.mutate.protocol module

Sequence statistical energy and mutation effect computation protocols

Authors:
Thomas A. Hopf Anna G. Green (complex)
evcouplings.mutate.protocol.complex(**kwargs)[source]

Protocol: Mutation effect prediction and visualization for protein complexes

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • mutation_matrix_file
  • [mutation_dataset_predicted_file]
Return type:dict
evcouplings.mutate.protocol.run(**kwargs)[source]

Run mutation protocol

Parameters:kwargs arguments (Mandatory) – protocol: EC protocol to run prefix: Output prefix for all generated files
Returns:outcfg – Output configuration of stage (see individual protocol for fields)
Return type:dict
evcouplings.mutate.protocol.standard(**kwargs)[source]

Protocol: Mutation effect calculation and visualization for protein monomers

TODO: eventually merge with complexes to make a protocol agnostic to the number of segments

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • mutation_matrix_file
  • [mutation_dataset_predicted_file]
Return type:dict