evcouplings.fold package

evcouplings.fold.cns module

evcouplings.fold.filter module

Functions for detecting ECs that should not be included in 3D structure prediction

Most functions in this module are rewritten from older pipeline code in choose_CNS_constraint_set.m

Authors:
Thomas A. Hopf
evcouplings.fold.filter.detect_secstruct_clash(i, j, secstruct)[source]

Detect if an EC pair (i, j) is geometrically impossible given a predicted secondary structure

Based on direct port of the logic implemented in choose_CNS_constraint_set.m from original pipeline, lines 351-407.

Use secstruct_clashes() to annotate an entire table of ECs.

Parameters:
  • i (int) – Index of first position
  • j (int) – Index of second position
  • secstruct (dict) – Mapping from position (int) to secondary structure (“H”, “E”, “C”)
Returns:

clashes – True if (i, j) clashes with secondary structure

Return type:

bool

evcouplings.fold.filter.disulfide_clashes(ec_pairs, output_column='cys_clash')[source]

Add disulfide bridge clashes to EC table (i.e. if any cysteine residue is coupled to another cysteine). This flag is necessary if disulfide bridges are created during folding, since only one bridge is possible per cysteine.

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for the occurrence of multiple cys-cys pairings (with columns i, j, A_i, A_j)
  • output_column (str, optional (default: "cys_clash")) – Target column indicating if pair is in a clash or not
Returns:

Annotated EC table with clashes

Return type:

pandas.DataFrame

evcouplings.fold.filter.secstruct_clashes(ec_pairs, residues, output_column='ss_clash', secstruct_column='sec_struct_3state')[source]

Add secondary structure clashes to EC table

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for clashes with secondary structure (with columns i, j)
  • residues (pandas.DataFrame) – Table with residues in sequence and their secondary structure (columns i, ss_pred).
  • output_column (str, optional (default: "secstruct_clash")) – Target column indicating if pair is in a clash or not
  • secstruct_column (str, optional (default: "sec_struct_3state")) – Source column in ec_pairs with secondary structure states (H, E, C)
Returns:

Annotated EC table with clashes

Return type:

pandas.DataFrame

evcouplings.fold.protocol module

evcouplings.fold.ranking module

evcouplings.fold.restraints module

Functions for generating distance restraints from evolutionary couplings and secondary structure predictions

Authors:
Thomas A. Hopf Anna G. Green (docking restraints)
evcouplings.fold.restraints.docking_restraints(ec_pairs, output_file, restraint_formatter, config_file=None)[source]

Create .tbl file with distance restraints for docking

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j, segment_i, segment_j)
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
evcouplings.fold.restraints.ec_dist_restraints(ec_pairs, output_file, restraint_formatter, config_file=None)[source]

Create .tbl file with distance restraints based on evolutionary couplings

Logic based on choose_CNS_constraint_set.m, lines 449-515

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j)
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
evcouplings.fold.restraints.secstruct_angle_restraints(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]

Create .tbl file with dihedral angle restraints based on secondary structure prediction

Logic based on make_cns_angle_constraints.pl

Parameters:
  • residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function, optional) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
  • secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).
evcouplings.fold.restraints.secstruct_dist_restraints(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]

Create .tbl file with distance restraints based on secondary structure prediction

Logic based on choose_CNS_constraint_set.m, lines 519-1162

Parameters:
  • residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
  • secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).

evcouplings.fold.tools module

Wrappers for tools for 3D structure prediction from evolutionary couplings

Authors:
Thomas A. Hopf
evcouplings.fold.tools.parse_maxcluster_clustering(clustering_output)[source]

Parse maxcluster clustering output into a DataFrame

Parameters:clustering_output (str) – stdout output from maxcluster after clustering
Returns:Parsed result table (columns: filename, cluster, cluster_size)
Return type:pandas.DataFrame
evcouplings.fold.tools.parse_maxcluster_comparison(comparison_output)[source]

Parse maxcluster output into a DataFrame

Parameters:comparison_output (str) – stdout output from maxcluster after comparison
Returns:Parsed result table (columns: filename, num_pairs, rmsd, maxsub, tm, msi), refer to maxcluster documentation for explanation of the score fields.
Return type:pandas.DataFrame
evcouplings.fold.tools.read_psipred_prediction(filename, first_index=1)[source]

Read a psipred secondary structure prediction file in horizontal or vertical format (auto-detected).

Parameters:
  • filename (str) – Path to prediction output file
  • first_index (int, optional (default: 1)) – Index of first position in predicted sequence
Returns:

pred – Table containing secondary structure prediction, with the following columns:

  • i: position
  • A_i: amino acid
  • sec_struct_3state: prediction (H, E, C)

If reading vformat, also contains columns for the individual (score_coil/helix/strand)

If reading hformat, also contains confidence score between 1 and 9 (sec_struct_conf)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_cns(inp_script=None, inp_file=None, log_file=None, binary='cns')[source]

Run CNSsolve 1.21 (without worrying about environment setup)

Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.

Either input_script or input_file has to be specified.

Parameters:
  • inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
  • inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
  • log_file (str, optional (default: None)) – Save CNS stdout output to this file
  • binary (str, optional (default: "cns")) – Absolute path of CNS binary
Raises:
  • ExternalToolError – If call to CNS fails
  • InvalidParameterError – If no input script (file or string) given
evcouplings.fold.tools.run_cns_13(inp_script=None, inp_file=None, log_file=None, source_script=None, binary='cns')[source]

Run CNSsolve 1.3

Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.

Either input_script or input_file has to be specified.

Parameters:
  • inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
  • inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
  • log_file (str, optional (default: None)) – Save CNS stdout output to this file
  • source_script (str, optional (default: None)) – Script to set CNS environment variables. This should typically point to .cns_solve_env_sh in the CNS installation main directory (the shell script itself needs to be edited to contain the path of the installation)
  • binary (str, optional (default: "cns")) – Name of CNS binary
Raises:
  • ExternalToolError – If call to CNS fails
  • InvalidParameterError – If no input script (file or string) given
evcouplings.fold.tools.run_maxcluster_cluster(predictions, method='average', rmsd=True, clustering_threshold=None, binary='maxcluster')[source]

Compare a set of predicted structures to an experimental structure using maxcluster.

For clustering functionality, use run_maxcluster_clustering() function.

Parameters:
  • predictions (list(str)) – List of PDB files that should be compared against experiment
  • method ({"single", "average", "maximum", "pairs_min", "pairs_abs"}, optional (default: "average")) – Clustering method (single / average / maximum linkage, or min / absolute size neighbour pairs
  • clustering_threshold (float (optional, default: None)) – Initial clustering threshold (maxcluster -T option)
  • rmsd (bool, optional (default: True)) – Use RMSD-based clustering (faster)
  • binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns:

Clustering result table (see parse_maxcluster_clustering for more detailed explanation)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_maxcluster_compare(predictions, experiment, normalization_length=None, distance_cutoff=None, binary='maxcluster')[source]

Compare a set of predicted structures to an experimental structure using maxcluster.

For clustering functionality, use run_maxcluster_clustering() function.

For a high-level wrapper around this function that removes problematic atoms and compares multiple models, please look at evcouplings.fold.protocol.compare_models_maxcluster().

Parameters:
  • predictions (list(str)) – List of PDB files that should be compared against experiment
  • experiment (str) – Path of experimental structure PDB file. Note that the numbering and residues in this file must agree with the predicted structure, and that the structure may not contain duplicate atoms (multiple models, or alternative locations for the same atom).
  • normalization_length (int, optional (default: None)) – Use this length to normalize the Template Modeling (TM) score (-N option of maxcluster). If None, will normalize by length of experiment.
  • distance_cutoff (float, optional (default: None)) – Distance cutoff for MaxSub search (-d option of maxcluster). If None, will use maxcluster auto-calibration.
  • binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns:

Comparison result table (see parse_maxcluster_comparison for more detailed explanation)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_psipred(fasta_file, output_dir, binary='runpsipred')[source]

Run psipred secondary structure prediction

psipred output file convention: run_psipred creates output files <rootname>.ss2 and <rootname2>.horiz in the current working directory, where <rootname> is extracted from the basename of the input file (e.g. /home/test/<rootname>.fa)

Parameters:
  • fasta_file (str) – Input sequence file in FASTA format
  • output_dir (str) – Directory in which output will be saved
  • binary (str, optional (default: "cns")) – Path of psipred executable (runpsipred)
Returns:

  • ss2_file (str) – Absolute path to prediction output in “VFORMAT”
  • horiz_file (str) – Absolute path to prediction output in “HFORMAT”

Raises:

ExternalToolError – If call to psipred fails