Welcome to EVcouplings’s documentation!

Alignment

evcouplings.align package

evcouplings.align.alignment module

evcouplings.align.pfam module

Code for identifying Pfam domains and mapping Pfam alignments and ECs into target sequence mode.

Authors:
Thomas A. Hopf

Todo

  1. Write code to create list of family sizes
  2. Implement alignments against Pfam-HMM so precomputed results can be reused in focus mode
evcouplings.align.pfam.create_family_size_table(full_pfam_file, outfile=None)[source]

Parse family size table from Pfam flat file (ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz)

Parameters:
  • full_pfam_file (str) – Path to the pfam file (gzip).
  • outfile (str, optional (default: None)) – Save the parsed table to this file as a csv file.
Returns:

Parsed Pfam table.

Return type:

pd.DataFrame

evcouplings.align.pfam.pfam_hits(query_file, hmm_database, prefix, clan_table_file, size_table_file, resolve_overlaps=True, **kwargs)[source]

Identify hits of Pfam HMMs in a set of sequences.

Parameters:
  • query_file (str) – File containing query sequence(s)
  • hmm_database (str) – File containing HMM database (Pfam-A.hmm, with hmmpress applied)
  • prefix (str) – Prefix path for output files. Folder structure in the prefix will be created if not existing.
  • clan_table_file (str) – File with table linking Pfam families to clans (Pfam-A.clans.tsv). Set to None if not available, but resolve_overlaps cannot be True in that case.
  • size_table_file (str) – File with table of family sizes. Create using create_family_size_table(). Set to None if not available.
  • resolve_overlaps (bool) – Resolve overlapping hits by families from the same clan. Only possible if clan_table_file is given.
  • **kwargs (dict) – kwargs passed on to evcouplings.align.tools.run_hmmscan
Returns:

Pfam hit table

Return type:

pd.DataFrame

evcouplings.align.pfam.remove_clan_overlaps(pfam_table)[source]

Remove overlapping Pfam hits from same Pfam clan (equivalent of PfamScan.pl). Currently only allows to remove overlaps by domain bitscore.

Todo

is bitscore the most sensible choice if different length hits?

Parameters:pfam_table (pd.DataFrame) – Pfam hit table as generated by pfam_hits() function (must contain Pfam clan annotation).
Returns:Pfam hit table with lower-scoring overlaps removed
Return type:pd.DataFrame

evcouplings.align.protocol module

Protein sequence alignment creation protocols/workflows.

Authors:
Thomas A. Hopf Anna G. Green - complex protocol, hmm_build_and_search Chan Kang - hmm_build_and_search
evcouplings.align.protocol.complex(**kwargs)[source]

Protocol:

Run monomer alignment protocol and postprocess it for EVcomplex calculations

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the alignment protocol, and the following additional field:
genome_location_file : path to file containing
the genomic locations for CDs’s corresponding to identifiers in the alignment.
Return type:dict
evcouplings.align.protocol.cut_sequence(sequence, sequence_id, region=None, first_index=None, out_file=None)[source]

Cut a given sequence to sub-range and save it in a file

Parameters:
  • sequence (str) – Full sequence that will be cut
  • sequence_id (str) – Identifier of sequence, used to construct header in output file
  • region (tuple(int, int), optional (default: None)) – Region that will be cut out of full sequence. If None, full sequence will be returned.
  • first_index (int, optional (default: None)) – Define index of first position in sequence. Will be set to 1 if None.
  • out_file (str, optional (default: None)) – Save sequence in a FASTA file (header: >sequence_id/start_region-end_region)
Returns:

  • str – Subsequence contained in region
  • tuple(int, int) – Region. If no input region is given, this will be (1, len(sequence)); otherwise, the input region is returned.

Raises:

InvalidParameterError – Upon invalid region specification (violating boundaries of sequence)

evcouplings.align.protocol.describe_coverage(alignment, prefix, first_index, minimum_column_coverage)[source]

Produce “classical” buildali coverage statistics, i.e. number of sequences, how many residues have too many gaps, etc.

Only to be applied to alignments focused around the target sequence.

Parameters:
  • alignment (Alignment) – Alignment for which coverage statistics will be calculated
  • prefix (str) – Prefix of alignment file that will be stored as identifier in table
  • first_index (int) – Sequence index of first position of target sequence
  • minimum_column_coverage (Iterable(float) or float) –

    Minimum column coverage threshold(s) that will be tested (creating one row for each threshold in output table).

    Note

    int values given to this function instead of a float will be divided by 100 to create the corresponding floating point representation. This parameter is 1.0 - maximum fraction of gaps per column.

Returns:

Table with coverage statistics for different gap thresholds

Return type:

pd.DataFrame

evcouplings.align.protocol.describe_frequencies(alignment, first_index, target_seq_index=None)[source]

Get parameters of alignment such as gaps, coverage, conservation and summarize.

Parameters:
  • alignment (Alignment) – Alignment for which description statistics will be calculated
  • first_index (int) – Sequence index of first residue in target sequence
  • target_seq_index (int, optional (default: None)) – If given, will add the symbol in the target sequence into a separate column of the output table
Returns:

Table detailing conservation and symbol frequencies for all positions in the alignment

Return type:

pandas.DataFrame

evcouplings.align.protocol.describe_seq_identities(alignment, target_seq_index=0)[source]

Calculate sequence identities of any sequence to target sequence and create result dataframe.

Parameters:alignment (Alignment) – Alignment for which description statistics will be calculated
Returns:Table giving the identity to target sequence for each sequence in alignment (in order of occurrence)
Return type:pandas.DataFrame
evcouplings.align.protocol.existing(**kwargs)[source]

Protocol:

Use external sequence alignment and extract all relevant information from there (e.g. sequence, region, etc.), then apply gap & fragment filtering as usual

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • sequence_id (passed through from input)
  • alignment_file
  • raw_focus_alignment_file
  • statistics_file
  • sequence_file
  • first_index
  • target_sequence_file
  • annotation_file (None)
  • frequencies_file
  • identities_file
  • focus_mode
  • focus_sequence
  • segments
Return type:dict
evcouplings.align.protocol.extract_header_annotation(alignment, from_annotation=True)[source]

Extract Uniprot/Uniref sequence annotation from Stockholm file (as output by jackhmmer). This function may not work for other formats.

Parameters:
  • alignment (Alignment) – Multiple sequence alignment object
  • from_annotation (bool, optional (default: True)) – Use annotation line (in Stockholm file) rather than sequence ID line (e.g. in FASTA file)
Returns:

Table containing all annotation (one row per sequence in alignment, in order of occurrence)

Return type:

pandas.DataFrame

evcouplings.align.protocol.fetch_sequence(sequence_id, sequence_file, sequence_download_url, out_file)[source]

Fetch sequence either from database based on identifier, or from input sequence file.

Parameters:
  • sequence_id (str) – Identifier of sequence that should be retrieved
  • sequence_file (str) – File containing sequence. If None, sqeuence will be downloaded from sequence_download_url
  • sequence_download_url (str) – URL from which to download missing sequence. Must contain “{}” at the position where sequence ID will be inserted into download URL (using str.format).
  • out_file (str) – Output file in which sequence will be stored, if sequence_file is not existing.
Returns:

  • str – Path of file with stored sequence (can be sequence_file or out_file)
  • tuple (str, str) – Identifier of sequence as stored in file, and sequence

Protocol:

Build HMM from sequence alignment using hmmbuild and search against a sequence database using hmmsearch.

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the protocol, including the following fields:
  • target_sequence_file
  • sequence_file
  • raw_alignment_file
  • hittable_file
  • focus_mode
  • focus_sequence
  • segments
Return type:dict

Protocol:

Iterative jackhmmer search against a sequence database.

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required

:param .. todo::: explain meaning of parameters in detail.

Returns:outcfg – Output configuration of the protocol, including the following fields:
  • sequence_id (passed through from input)
  • first_index (passed through from input)
  • target_sequence_file
  • sequence_file
  • raw_alignment_file
  • hittable_file
  • focus_mode
  • focus_sequence
  • segments
Return type:dict
evcouplings.align.protocol.modify_alignment(focus_ali, target_seq_index, target_seq_id, region_start, **kwargs)[source]

Apply pairwise identity filtering, fragment filtering, and exclusion of columns with too many gaps to a sequence alignment. Also generates files describing properties of the alignment such as frequency distributions, conservation, and “old-style” alignment statistics files.

Note

assumes focus alignment (otherwise unprocessed) as input.

Todo

come up with something more clever to filter fragments than fixed width (e.g. use 95% quantile of length distribution as reference point)

Parameters:
  • focus_ali (Alignment) – Focus-mode input alignment
  • target_seq_index (int) – Index of target sequence in alignment
  • target_seq_id (str) – Identifier of target sequence (without range)
  • region_start (int) – Index of first sequence position in target sequence
  • kwargs (See required arguments in source code) –
Returns:

  • outcfg (Dict) – File products generated by the function:
    • alignment_file
    • statistics_file
    • frequencies_file
    • identities_file
    • raw_focus_alignment_file
  • ali (Alignment) – Final processed alignment

evcouplings.align.protocol.run(**kwargs)[source]

Run alignment protocol to generate multiple sequence alignment from input sequence.

Parameters:
  • kwargs arguments (Mandatory) – protocol: Alignment protocol to run prefix: Output prefix for all generated files
  • Optional
Returns:

  • Alignment
  • Dictionary with results of stage in following fields (in brackets - not returned by all protocols)
    • alignment_file
    • [raw_alignment_file]
    • statistics_file
    • target_sequence_file
    • sequence_file
    • [annotation_file]
    • frequencies_file
    • identities_file
    • [hittable_file]
    • focus_mode
    • focus_sequence
    • segments

evcouplings.align.protocol.search_thresholds(use_bitscores, seq_threshold, domain_threshold, seq_len)[source]

Set homology search inclusion parameters.

HMMER hits get included in the HMM according to a two-step rule
  1. sequence passes sequence-level treshold
  2. domain passes domain-level threshold
Therefore, search thresholds are set based on the following logic:
  1. If only sequence threshold is given, a MissingParameterException is raised
  2. If only bitscore threshold is given, sequence threshold is set to the same
  3. If both thresholds are given, they are according to defined values
Valid inputs for bitscore thresholds:
  1. int or str: taken as absolute score threshold
  2. float: taken as relative threshold (absolute threshold derived by

multiplication with domain length)

Valid inputs for integer thresholds:
  1. int: Used as negative exponent, threshold will be set to 1E-<exponent>
  2. float or str: Interpreted literally
Parameters:
  • use_bitscores (bool) – Use bitscore threshold instead of E-value threshold
  • domain_threshold (str or int or float) – Domain-level threshold. See rules above.
  • seq_threshold (str or int or float) – Sequence-level threshold. See rules above.
  • seq_len (int) – Length of sequence. Used to calculate absolute bitscore threshold for relative bitscore thresholds.
Returns:

Sequence- and domain-level thresholds ready to be fed into HMMER

Return type:

tuple (str, str)

evcouplings.align.protocol.standard(**kwargs)[source]

Protocol:

Standard buildali4 workflow (run iterative jackhmmer search against sequence database, than determine which sequences and columns to include in the calculation based on coverage and maximum gap thresholds).

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:
  • outcfg (dict) – Output configuration of the pipeline, including the following fields:
    • sequence_id (passed through from input)
    • first_index (passed through from input)
    • alignment_file
    • raw_alignment_file
    • raw_focus_alignment_file
    • statistics_file
    • target_sequence_file
    • sequence_file
    • annotation_file
    • frequencies_file
    • identities_file
    • hittable_file
    • focus_mode
    • focus_sequence
    • segments
  • ali (Alignment) – Final sequence alignment

evcouplings.align.tools module

Wrappers for running external sequence alignment tools

Authors:
Thomas A. Hopf Anna G. Green - run_hmmbuild, run_hmmsearch Chan Kang - run_hmmbuild, run_hmmsearch
class evcouplings.align.tools.HmmbuildResult(prefix, hmmfile, output)

Bases: tuple

hmmfile

Alias for field number 1

output

Alias for field number 2

prefix

Alias for field number 0

class evcouplings.align.tools.HmmscanResult(prefix, output, tblout, domtblout, pfamtblout)

Bases: tuple

domtblout

Alias for field number 3

output

Alias for field number 1

pfamtblout

Alias for field number 4

prefix

Alias for field number 0

tblout

Alias for field number 2

class evcouplings.align.tools.HmmsearchResult(prefix, alignment, output, tblout, domtblout)

Bases: tuple

alignment

Alias for field number 1

domtblout

Alias for field number 4

output

Alias for field number 2

prefix

Alias for field number 0

tblout

Alias for field number 3

class evcouplings.align.tools.JackhmmerResult(prefix, alignment, output, tblout, domtblout)

Bases: tuple

alignment

Alias for field number 1

domtblout

Alias for field number 4

output

Alias for field number 2

prefix

Alias for field number 0

tblout

Alias for field number 3

evcouplings.align.tools.read_hmmer_domtbl(filename)[source]

Read a HMMER domtbl file into DataFrame.

Parameters:filename (str) – Path of domtbl file
Returns:DataFrame with parsed domtbl
Return type:pd.DataFrame
evcouplings.align.tools.read_hmmer_tbl(filename)[source]

Read a HMMER tbl file into DataFrame.

Parameters:filename (str) – Path of tbl file
Returns:DataFrame with parsed tbl
Return type:pd.DataFrame
evcouplings.align.tools.run_hhfilter(input_file, output_file, threshold=95, columns='a2m', binary='hhfilter')[source]

Redundancy-reduce a sequence alignment using hhfilter from the HHsuite alignment suite.

Parameters:
  • input_file (str) – Path to input alignment in A2M/FASTA format
  • output_file (str) – Path to output alignment (will be in A3M format)
  • threshold (int, optional (default: 95)) – Sequence identity threshold for maximum pairwise identity (between 0 and 100)
  • columns ({"first", "a2m"}, optional (default: "a2m")) – Definition of match columns (based on first sequence or upper-case columns (a2m))
  • binary (str) – Path to hhfilter binary
Returns:

output_file

Return type:

str

Raises:
  • ResourceError – If output alignment is non-existent/empty
  • ValueError – Upon invalid value of columns parameter
evcouplings.align.tools.run_hmmbuild(alignment_file, prefix, cpu=None, stdout_redirect=None, symfrac=None, binary='hmmbuild')[source]

Profile HMM construction from multiple sequence alignments Refer to HMMER documentation for details.

http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf

Parameters:
  • alignment_file (str) – File containing the multiple sequence alignment. Can be in Stockholm, a2m, or clustal formats, or any other format recognized by hmmer. Please note that ALL POSITIONS above the symfrac cutoff will be used in HMM construction (if the alignment contains columns that are insertions relative to the query sequence, this may be problematic for structure comparison)
  • prefix (str) – Prefix path for output files. Folder structure in the prefix will be created if not existing.
  • cpu (int, optional (default: None)) – Number of CPUs to use for search. Uses all if None.
  • stdout_redirect (str, optional (default: None)) – Redirect bulky stdout instead of storing with rest of results (use “/dev/null” to dispose)
  • symfrac (float, optional (default: None)) – range 0.0 - 1.0, HMMbuild will use columns with > symfrac percent gaps to construct the HMM. If None provided, HMMbuild internal default is 0.5. (Note: this is calculated after their internal sequence weighting is calculated)
  • binary (str (default: "hmmbuild")) – Path to jackhmmer binary (put in PATH for default to work)
Returns:

namedtuple with fields corresponding to the different output files (prefix, alignment, output, tblout, domtblout)

Return type:

HmmbuildResult

Raises:

ExternalToolError, ResourceError

evcouplings.align.tools.run_hmmscan(query, database, prefix, use_model_threshold=True, threshold_type='cut_ga', use_bitscores=True, domain_threshold=None, seq_threshold=None, nobias=False, cpu=None, stdout_redirect=None, binary='hmmscan')[source]

Run hmmscan of HMMs in database against sequences in query to identify matches of these HMMs. Refer to HMMER Userguide for explanation of these parameters.

Parameters:
  • query (str) – File containing query sequence(s)
  • database (str) – File containing HMM database (prepared with hmmpress)
  • prefix (str) – Prefix path for output files. Folder structure in the prefix will be created if not existing.
  • use_model_threshold (bool (default: True)) – Use model-specific inclusion thresholds from HMM database rather than global bitscore/E-value thresholds (use_bitscores, domain_threshold and seq_threshold are overriden by this flag).
  • threshold-type ({"cut_ga", "cut_nc", "cut_tc"} (default: "cut_ga")) – Use gathering (default), noise or trusted cutoff to define scan hits. Please refer to HMMER manual for details.
  • use_bitscores (bool) – Use bitscore inclusion thresholds rather than E-values. Overriden by use_model_threshold flag.
  • domain_threshold (int or float or str) – Inclusion threshold applied on the domain level (e.g. “1E-03” or 0.001 or 50)
  • seq_threshold (int or float or str) – Inclusion threshold applied on the sequence level (e.g. “1E-03” or 0.001 or 50)
  • nobias (bool, optional (default: False)) – Turn of bias correction
  • cpu (int, optional (default: None)) – Number of CPUs to use for search. Uses all if None.
  • stdout_redirect (str, optional (default: None)) – Redirect bulky stdout instead of storing with rest of results (use “/dev/null” to dispose)
  • binary (str (default: "hmmscan")) – Path to hmmscan binary (put in PATH for default to work)
Returns:

namedtuple with fields corresponding to the different output files (prefix, output, tblout, domtblout, pfamtblout)

Return type:

HmmscanResult

Raises:

ExternalToolError, ResourceError

evcouplings.align.tools.run_hmmsearch(hmmfile, database, prefix, use_bitscores, domain_threshold, seq_threshold, nobias=False, cpu=None, stdout_redirect=None, binary='hmmsearch')[source]

Search profile(s) against a sequence database. Refer to HMMER documentation for details.

http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf

Parameters:
  • hmmfile (str) – File containing the profile(s)
  • database (str) – File containing sequence database
  • prefix (str) – Prefix path for output files. Folder structure in the prefix will be created if not existing.
  • use_bitscores (bool) – Use bitscore inclusion thresholds rather than E-values.
  • domain_threshold (int or float or str) – Inclusion threshold applied on the domain level (e.g. “1E-03” or 0.001 or 50)
  • seq_threshold (int or float or str) – Inclusion threshold applied on the sequence level (e.g. “1E-03” or 0.001 or 50)
  • nobias (bool, optional (default: False)) – Turn of bias correction
  • cpu (int, optional (default: None)) – Number of CPUs to use for search. Uses all if None.
  • stdout_redirect (str, optional (default: None)) – Redirect bulky stdout instead of storing with rest of results (use “/dev/null” to dispose)
  • binary (str (default: "hmmsearch")) – Path to jackhmmer binary (put in PATH for default to work)
Returns:

namedtuple with fields corresponding to the different output files (prefix, alignment, output, tblout, domtblout)

Return type:

HmmsearchResult

Raises:

ExternalToolError, ResourceError

evcouplings.align.tools.run_jackhmmer(query, database, prefix, use_bitscores, domain_threshold, seq_threshold, iterations=5, nobias=False, cpu=None, stdout_redirect=None, checkpoints_hmm=False, checkpoints_ali=False, binary='jackhmmer')[source]

Run jackhmmer sequence search against target database. Refer to HMMER Userguide for explanation of these parameters.

Parameters:
  • query (str) – File containing query sequence
  • database (str) – File containing sequence database
  • prefix (str) – Prefix path for output files. Folder structure in the prefix will be created if not existing.
  • use_bitscores (bool) – Use bitscore inclusion thresholds rather than E-values.
  • domain_threshold (int or float or str) – Inclusion threshold applied on the domain level (e.g. “1E-03” or 0.001 or 50)
  • seq_threshold (int or float or str) – Inclusion threshold applied on the sequence level (e.g. “1E-03” or 0.001 or 50)
  • iterations (int) – number of jackhmmer search iterations
  • nobias (bool, optional (default: False)) – Turn of bias correction
  • cpu (int, optional (default: None)) – Number of CPUs to use for search. Uses all if None.
  • stdout_redirect (str, optional (default: None)) – Redirect bulky stdout instead of storing with rest of results (use “/dev/null” to dispose)
  • checkpoints_hmm (bool, optional (default: False)) – Store checkpoint HMMs to prefix.<iter>.hmm
  • checkpoints_ali (bool, optional (default: False)) – Store checkpoint alignments to prefix.<iter>.sto
  • binary (str (default: "jackhmmer")) – Path to jackhmmer binary (put in PATH for default to work)
Returns:

namedtuple with fields corresponding to the different output files (prefix, alignment, output, tblout, domtblout)

Return type:

JackhmmerResult

Raises:

ExternalToolError, ResourceError

Couplings Analysis

evcouplings.compare package

evcouplings.compare.distances module

Distance calculations on PDB 3D coordinates

Authors:
Thomas A. Hopf Anna G. Green (remap_complex_chains)
class evcouplings.compare.distances.DistanceMap(residues_i, residues_j, dist_matrix, symmetric)[source]

Bases: object

Compute, store and accesss pairwise residue distances in PDB 3D structures

classmethod aggregate(*matrices, intersect=False, agg_func=<MagicMock id='139675017884048'>)[source]

Aggregate with other distance map(s). Secondary structure will be aggregated by assigning the most frequent state across all distance matrices; if there are equal counts, H (helix) will be chosen over E (strand) over C (coil).

Parameters:
  • *matrices (DistanceMap) –

    *args-style list of DistanceMaps that will be aggregated.

    Note

    The id column of each axis may only contain numeric residue ids (and no characters such as insertion codes)

  • intersect (bool, optional (default: False)) – If True, intersect indices of the given distance maps. Otherwise, union of indices will be used.
  • agg_func (function (default: numpy.nanmin)) – Function that will be used to aggregate distance matrices. Needs to take a parameter “axis” to aggregate over all matrices.
Returns:

Aggregated distance map

Return type:

DistanceMap

Raises:

ValueError – If residue identifiers are not numeric, or if intersect is True, but positions on axis do not overlap.

contacts(max_dist=5.0, min_dist=None)[source]

Return list of pairs below distance threshold

Parameters:
  • max_dist (float, optional (default: 5.0)) – Maximum distance for any pair to be considered a contact
  • min_dist (float, optional (default: None)) – Minimum distance of any pair to be returned (may be useful if extracting different distance ranges from matrix). Distance has to be > min_dist, (not >=).
Returns:

contacts – Table with residue-residue contacts, with the following columns:

  1. id_i: identifier of residue in chain i
  2. id_j: identifier of residue in chain j
  3. dist: pair distance

Return type:

pandas.DataFrame

dist(i, j, raise_na=True)[source]

Return distance of residue pair

Parameters:
  • i (int or str) – Identifier of position on first axis
  • j (int or str) – Identifier of position on second axis
  • raise_na (bool, optional (default: True)) – Raise error if i or j is not contained in either axis. If False, returns np.nan for undefined entries.
Returns:

Distance of pair (i, j). If raise_na is False and identifiers are not valid, distance will be np.nan

Return type:

np.float

Raises:

KeyError – If index i or j is not a valid identifier for respective chain

classmethod from_coords(chain_i, chain_j=None)[source]

Compute distance matrix from PDB chain coordinates.

Parameters:
  • chain_i (Chain) – PDB chain to be used for first axis of matrix
  • chain_j (Chain, optional (default: None)) – PDB chain to be used for second axis of matrix. If not given, will be set to chain_i, resulting in a symmetric distance matrix
Returns:

Distance map computed from given coordinates

Return type:

DistanceMap

classmethod from_file(filename)[source]

Load existing distance map using filename prefix (each distance map consist of .csv and .npy file)

Parameters:filename (str) – Prefix of path to distance map files (excluding .csv/.npy)
Returns:Loaded distance map
Return type:DistanceMap
classmethod from_files(residue_table_file, distance_matrix_file)[source]

Load existing distance map with explicit paths to residue table (.csv) and distance matrix (.npy). Use DistanceMap.from_file to load using joint prefix of both files.

Parameters:
  • residue_table_file (str or file-like object) – Path to residue table file (prefix + .csv)
  • distance_matrix_file (str or file-like object) – Path to distance matrix file (prefix + .npy)
Returns:

Loaded distance map

Return type:

DistanceMap

structure_coverage()[source]

Find covered residue segments for individual structures that this distance map was computed from (either directly from structure or through aggregation of multiple structures). Only works if all residue identifiers of DistanceMap are numeric (i.e., do not have insertion codes)

Returns:coverage – Returns tuples of the form (coverage_i, coverage_j, coverage_id), where * coverage_i and coverage_j are lists of tuples
(segment_start, segment_end) of residue coverage along axis i and j, with segment_end being included in the range
  • coverage_id is the identifier of the individual substructure the coverage segments belong to (only set if an aggregated structure, None otherwise)
Return type:list of tuple
to_file(filename)[source]

Store distance map in files

Parameters:filename (str) – Prefix of distance map files (will create .csv and .npy file)
Returns:
  • residue_table_filename (str) – Path to residue table (will be filename + .csv)
  • dist_mat_filename (str) – Path to distance matrix file in numpy format (will be filename + .npy)
transpose()[source]

Transpose distance map (i.e. swap axes)

Returns:Transposed copy of distance map
Return type:DistanceMap
evcouplings.compare.distances.inter_dists(sifts_result_i, sifts_result_j, structures=None, atom_filter=None, intersect=False, output_prefix=None, model=0, raise_missing=True)[source]

Compute inter-chain distances (between different entities) in PDB file. Resulting distance map is typically not symmetric, with either axis corresponding to either chain. Inter-distances are calculated on all combinations of chains that have the same PDB id in sifts_result_i and sifts_result_j.

Parameters:
  • sifts_result_i (SIFTSResult) – Input structures and mapping to use for first axis of computed distance map
  • sifts_result_j (SIFTSResult) – Input structures and mapping to use for second axis of computed distance map
  • structures (str or dict, optional (default: None)) –
    • If str: Load structures from directory this string points to. Missing structures will be fetched from web.
    • If dict: dictionary with lower-case PDB ids as keys and PDB objects as values. This dictionary has to contain all necessary structures, missing ones will not be fetched. This dictionary can be created using pdb.load_structures.
  • atom_filter (str, optional (default: None)) – Filter coordinates to contain only these atoms. E.g. set to “CA” to compute C_alpha - C_alpha distances instead of minimum atom distance over all atoms in both residues.
  • intersect (bool, optional (default: False)) – If True, intersect indices of the given distance maps. Otherwise, union of indices will be used.
  • output_prefix (str, optional (default: None)) – If given, save individual contact maps to files prefixed with this string. The appended file suffixes map to row index in sifts_results.hits
  • model (int, optional (default: 0)) – Index of model in PDB structure that should be used
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError if any of the input structures can not be loaded; otherwise, ignore missing entries.
Returns:

agg_distmap – Computed aggregated distance map across all input structures

If output_prefix is given, agg_distmap will have an additional attribute individual_distance_map_table:

pd.DataFrame with all individual distance maps that went into the aggregated distance map, with columns “sifts_table_index_i”, “sifts_table_index_j” (linking to SIFTS hit table) and “residue_table” and “distance_matrix” (file names of .csv and .npy files constituting the respective distance map).

Will be None if output_prefix is None.

Return type:

DistanceMap

Raises:
  • ValueError – If sifts_result_i or sifts_result_j is empty (no structure hits)
  • ResourceError – If any structure could not be loaded and raise_missing is True
evcouplings.compare.distances.intra_dists(sifts_result, structures=None, atom_filter=None, intersect=False, output_prefix=None, model=0, raise_missing=True)[source]

Compute intra-chain distances in PDB files.

Parameters:
  • sifts_result (SIFTSResult) – Input structures and mapping to use for distance map calculation
  • structures (str or dict, optional (default: None)) –

    If str: Load structures from directory this string points to. Missing structures will be fetched from web.

    If dict: dictionary with lower-case PDB ids as keys and PDB objects as values. This dictionary has to contain all necessary structures, missing ones will not be fetched. This dictionary can be created using pdb.load_structures.

  • atom_filter (str, optional (default: None)) – Filter coordinates to contain only these atoms. E.g. set to “CA” to compute C_alpha - C_alpha distances instead of minimum atom distance over all atoms in both residues.
  • intersect (bool, optional (default: False)) – If True, intersect indices of the given distance maps. Otherwise, union of indices will be used.
  • output_prefix (str, optional (default: None)) – If given, save individual contact maps to files prefixed with this string. The appended file suffixes map to row index in sifts_results.hits
  • model (int, optional (default: 0)) – Index of model in PDB structure that should be used
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError if any of the input structures can not be loaded; otherwise, ignore missing entries.
Returns:

agg_distmap – Computed aggregated distance map across all input structures

Contains an additional attribute aggregated_residue_maps, a pd.DataFrame with the concatenated residue maps of all individual chains used to compute this DistanceMap. Individual chains are linked to the input sifts_results through the column sifts_table_index.

If output_prefix is given, agg_distmap will have an additional attribute individual_distance_map_table: pd.DataFrame with all individual distance maps that went into the aggregated distance map, with columns “sifts_table_index” (linking to SIFTS hit table) and “residue_table” and “distance_matrix” (file names of .csv and .npy files constituting the respective distance map). Will be None if output_prefix is None.

Return type:

DistanceMap

Raises:
  • ValueError – If sifts_result is empty (no structure hits)
  • ResourceError – If any structure could not be loaded and raise_missing is True
evcouplings.compare.distances.multimer_dists(sifts_result, structures=None, atom_filter=None, intersect=False, output_prefix=None, model=0, raise_missing=True)[source]

Compute homomultimer distances (between repeated copies of the same entity) in PDB file. Resulting distance matrix will be symmetric by minimization over upper and lower triangle of matrix, even if the complex structure is not symmetric.

Parameters:
  • sifts_result (SIFTSResult) – Input structures and mapping to use for distance map calculation
  • structures (str or dict, optional (default: None)) –

    If str: Load structures from directory this string points to. Missing structures will be fetched from web.

    If dict: dictionary with lower-case PDB ids as keys and PDB objects as values. This dictionary has to contain all necessary structures, missing ones will not be fetched. This dictionary can be created using pdb.load_structures.

  • atom_filter (str, optional (default: None)) – Filter coordinates to contain only these atoms. E.g. set to “CA” to compute C_alpha - C_alpha distances instead of minimum atom distance over all atoms in both residues.
  • intersect (bool, optional (default: False)) – If True, intersect indices of the given distance maps. Otherwise, union of indices will be used.
  • output_prefix (str, optional (default: None)) – If given, save individual contact maps to files prefixed with this string. The appended file suffixes map to row index in sifts_results.hits
  • model (int, optional (default: 0)) – Index of model in PDB structure that should be used
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError if any of the input structures can not be loaded; otherwise, ignore missing entries.
Returns:

agg_distmap – Computed aggregated distance map across all input structures

If output_prefix is given, agg_distmap will have an additional attribute individual_distance_map_table: pd.DataFrame with all individual distance maps that went into the aggregated distance map, with columns “sifts_table_index_i”, “sifts_table_index_j” (linking to SIFTS hit table) and “residue_table” and “distance_matrix” (file names of .csv and .npy files constituting the respective distance map).

Will be None if output_prefix is None.

Return type:

DistanceMap

Raises:
  • ValueError – If sifts_result is empty (no structure hits)
  • ResourceError – If any structure could not be loaded and raise_missing is True
evcouplings.compare.distances.remap_chains(sifts_result, output_prefix, sequence=None, structures=None, atom_filter=('N', 'CA', 'C', 'O'), model=0, chain_name='A', raise_missing=True)[source]

Remap a set of PDB chains into the numbering scheme (and amino acid sequence) of a target sequence (a.k.a. the poorest homology model possible).

(This function is placed here because of close relationship to intra_dists and reusing functionality for it).

Parameters:
  • sifts_result (SIFTSResult) – Input structures and mapping to use for remapping
  • output_prefix (str) – Save remapped structures to files prefixed with this string
  • sequence (dict, optional (default: None)) –

    Mapping from sequence position (int or str) to residue. If this parameter is given, residues in the output structures will be renamed to the residues in this mapping.

    Note

    if side-chain residues are not taken off using atom_filter, this will e.g. happily label an actual glutamate as an alanine).

  • structures (str or dict, optional (default: None)) –
    • If str: Load structures from directory this string points to. Missing structures will be fetched from web.
    • If dict: dictionary with lower-case PDB ids as keys and PDB objects as values. This dictionary has to contain all necessary structures, missing ones will not be fetched. This dictionary can be created using pdb.load_structures.
  • atom_filter (str, optional (default: ("N", "CA", "C", "O"))) – Filter coordinates to contain only these atoms. If None, will retain all atoms; the default value will only keep backbone atoms.
  • model (int, optional (default: 0)) – Index of model in PDB structure that should be used
  • chain_name (str, optional (default: "A")) – Rename the PDB chain to this when saving the file. This will not affect the file name, only the name of the chain in the PDB object.
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError if any of the input structures can not be loaded; otherwise, ignore missing entries.
Returns:

remapped – Mapping from index of each structure hit in sifts_results.hits to filename of stored remapped structure

Return type:

dict

evcouplings.compare.distances.remap_complex_chains(sifts_result_i, sifts_result_j, sequence_i=None, sequence_j=None, structures=None, atom_filter=('N', 'CA', 'C', 'O'), output_prefix=None, raise_missing=True, chain_name_i='A', chain_name_j='B', model=0)[source]

Remap a pair of PDB chains from the same structure into the numbering scheme (and amino acid sequence) of a target sequence.

Parameters:
  • sifts_result_i (SIFTSResult) – Input structures and mapping to use for remapping
  • sifts_result_j (SIFTSResult) – Input structures and mapping to use for remapping
  • output_prefix (str) – Save remapped structures to files prefixed with this string
  • sequence_i (dict, optional (default: None)) –

    Mapping from sequence position (int or str) in the first sequence to residue. If this parameter is given, residues in the output structures will be renamed to the residues in this mapping.

    Note

    if side-chain residues are not taken off using atom_filter, this will e.g. happily label an actual glutamate as an alanine).

  • sequence_j (dict, optional (default: None)) – Same as sequence_j for second sequence.
  • structures (str or dict, optional (default: None)) –
    • If str: Load structures from directory this string points to. Missing structures will be fetched from web.
    • If dict: dictionary with lower-case PDB ids as keys and PDB objects as values. This dictionary has to contain all necessary structures, missing ones will not be fetched. This dictionary can be created using pdb.load_structures.
  • atom_filter (str, optional (default: ("N", "CA", "C", "O"))) – Filter coordinates to contain only these atoms. If None, will retain all atoms; the default value will only keep backbone atoms.
  • model (int, optional (default: 0)) – Index of model in PDB structure that should be used
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError if any of the input structures can not be loaded; otherwise, ignore missing entries.
  • chain_name_i (str, optional (default: "A")) – Renames the first chain to this string
  • chain_name_j (str, optional (default: "B")) – Renames the second chain to this string
Returns:

remapped – Mapping from index of each structure hit in sifts_results.hits to filename of stored remapped structure

Return type:

dict

Raises:
  • ValueError – If sifts_result_i or sifts_result_j is empty (no structure hits)
  • ResourceError – If any structure could not be loaded and raise_missing is True

evcouplings.compare.ecs module

Compare evolutionary couplings to distances in 3D structures

Authors:
Thomas A. Hopf
evcouplings.compare.ecs.add_distances(ec_table, dist_map, target_column='dist')[source]

Add pair distances to EC score table

Parameters:
  • ec_table (pandas.DataFrame) – List of evolutionary couplings, with pair positions in columns i and j
  • dist_map (DistanceMap) – Distance map that will be used to annotate distances in ec_table
  • target_column (str) – Name of column in which distances will be stored
Returns:

Couplings table with added distances in target_column. Pairs where no distance information is available will be np.nan

Return type:

pandas.DataFrame

evcouplings.compare.ecs.add_precision(ec_table, dist_cutoff=5, score='cn', min_sequence_dist=6, target_column='precision', dist_column='dist')[source]

Compute precision of evolutionary couplings as predictor of 3D structure contacts

Parameters:
  • ec_table (pandas.DataFrame) – List of evolutionary couplings
  • dist_cutoff (float, optional (default: 5)) – Upper distance cutoff (in Angstrom) for a pair to be considered a true positive contact
  • score (str, optional (default: "cn")) – Column which contains coupling score. Table will be sorted in descending order by this score.
  • min_sequence_dist (int, optional (default: 6)) – Minimal distance in primary sequence for an EC to be included in precision calculation
  • target_column (str, optional (default: "precision")) – Name of column in which precision will be stored
  • dist_column (str, optional (default: "dist")) – Name of column which contains pair distances
Returns:

EC table with added precision values as a function of EC rank (returned table will be sorted by score column)

Return type:

pandas.DataFrame

evcouplings.compare.ecs.coupling_scores_compared(ec_table, dist_map, dist_map_multimer=None, dist_cutoff=5, output_file=None, score='cn', min_sequence_dist=6)[source]

Utility function to create “CouplingScores.csv”-style table

Parameters:
  • ec_table (pandas.DataFrame) – List of evolutionary couplings
  • dist_map (DistanceMap) – Distance map that will be used to annotate distances in ec_table
  • dist_map_multimer (DistanceMap, optional (default: None)) – Additional multimer distance map. If given, the distance for any EC pair will be the minimum out of the monomer and multimer distances.
  • dist_cutoff (float, optional (default: 5)) – Upper distance cutoff (in Angstrom) for a pair to be considered a true positive contact
  • output_file (str, optional (default: None)) – Store final table to this file
  • score (str, optional (default: "cn")) – Column which contains coupling score. Table will be sorted in descending order by this score.
  • min_sequence_dist (int, optional (default: 6)) – Minimal distance in primary sequence for an EC to be included in precision calculation
Returns:

EC table with added distances, and precision if dist_cutoff is given.

Return type:

pandas.DataFrame

evcouplings.compare.mapping module

Index mapping for PDB structures

Authors:
Thomas A. Hopf Charlotta P. Schärfe
evcouplings.compare.mapping.alignment_index_mapping(alignment_file, format='stockholm', target_seq=None)[source]

Create index mapping table between sequence positions based on a sequence alignment.

Parameters:
  • alignment_file (str) – Path of alignment file containing sequences for which indices shoul dbe mapped
  • format ({"stockholm", "fasta"}) – Format of alignment file
  • target_seq (str, optional (default: None)) – Identifier of sequence around which the index mapping will be centered. If None, first sequence in alignment will be used.
Returns:

Mapping table containing assignment of

  1. index in target sequence (i)
  2. symbol in target sequence (A_i)

For all other sequences in alignment, the following two columns:

  1. index in second sequence (j_<sequence id>)
  2. symbol in second sequence (A_j_<sequence_id>)

Return type:

pandas.DataFrame

evcouplings.compare.mapping.map_indices(seq_i, start_i, end_i, seq_j, start_j, end_j, gaps=('-', '.'))[source]

Compute index mapping between positions in two aligned sequences

Parameters:
  • seq_i (str) – First aligned sequence
  • start_i (int) – Index of first position in first sequence
  • end_i (int) – Index of last position in first sequence (used for verification purposes only)
  • seq_j (str) – Second aligned sequence
  • start_j (int) – Index of first position in second sequence
  • end_j (int) – Index of last position in second sequence (used for verification purposes only)
Returns:

Mapping table containing assignment of

  1. index in first sequence (i)
  2. symbol in first sequence (A_i)
  3. index in second sequence (j)
  4. symbol in second sequence (A_j)

Return type:

pandas.DataFrame

evcouplings.compare.pdb module

PDB structure handling based on MMTF format

Authors:
Thomas A. Hopf
class evcouplings.compare.pdb.Chain(residues, coords)[source]

Bases: object

Container for PDB chain residue and coordinate information

filter_atoms(atom_name='CA')[source]

Filter coordinates of chain, e.g. to compute C_alpha-C_alpha distances

Parameters:atom_name (str or list-like, optional (default: "CA")) – Name(s) of atoms to keep
Returns:Chain containing only filtered atoms (and those residues that have such an atom)
Return type:Chain
filter_positions(positions)[source]

Select a subset of positions from the chain

Parameters:positions (list-like) – Set of residues that will be kept
Returns:Chain containing only the selected residues
Return type:Chain
remap(mapping, source_id='seqres_id')[source]

Remap chain into different numbering scheme (e.g. from seqres to uniprot numbering)

Parameters:
  • mapping (dict) –

    Mapping of residue identifiers from source_id (current main index of PDB chain) to new identifiers.

    mapping may either be:

    1. dict(str -> str) to map individual residue IDs. Keys and values of dictionary will be typecast to string before the mapping, so it is possible to pass in integer values too (if the source or target IDs are numbers)
    2. dict((int, int) -> (int, int)) to map ranges of numbers to ranges of numbers. This should typically be only used with RESSEQ or UniProt numbering. End index or range is *inclusive* Note that residue IDs in the end will still be handled as strings when mapping.
  • source_id ({"seqres_id", "coord_id", "id"}, optional (default: "seqres_id")) – Residue identifier in chain to map *from* (will be used as key to access mapping)
Returns:

Chain with remapped numbering (“id” column in residues DataFrame)

Return type:

Chain

to_file(fileobj, chain_id='A', end=True, first_atom_id=1)[source]

Write chain to a file in PDB format (mmCIF not yet supported).

Note that PDB files written this function may not be 100% compliant with the PDB format standards, in particular:

  • some HETATM records may turn into ATOM records when starting from an mmtf file, if the record has a one-letter code (such as MSE / M).
  • code does not print TER record at the end of a peptide chain
Parameters:
  • fileobj (file-like object) – Write to this file handle
  • chain_id (str, optional (default: "A")) – Assign this chain name in file (allows to redefine chain name from whatever chain was originally)
  • end (bool, optional (default: True)) – Print “END” record after chain (signals end of PDB file)
  • first_atom_id (int, optional (default: 1)) – Renumber atoms to start with this index (set to None to keep default indices)
Raises:

ValueError – If atom or residue numbers are too wide and cannot be written to old fixed-column PDB file format

to_seqres()[source]

Return copy of chain with main index set to SEQRES numbering. Residues that do not have a SEQRES id will be dropped.

Returns:Chain with seqres IDs as main index
Return type:Chain
class evcouplings.compare.pdb.ClassicPDB(structure)[source]

Bases: object

Class to handle “classic” PDB and mmCIF formats (for new mmtf format see PDB class above). Wraps around Biopython PDB functionality to provide a consistent interface.

Unlike the PDB class (based on mmtf), this object will not be able to extract SEQRES indices corresponding to ATOM-record residue indices.

classmethod from_file(filename, file_format='pdb')[source]

Initialize structure from PDB/mmCIF file

Parameters:
  • filename (str) – Path of file
  • file_format ({"pdb", "cif"}, optional (default: "pdb")) – Format of structure (old PDB format or mmCIF)
Returns:

Initialized PDB structure

Return type:

ClassicPDB

classmethod from_id(pdb_id)[source]

Initialize structure by PDB ID (fetches structure from RCSB servers)

Parameters:pdb_id (str) – PDB identifier (e.g. 1hzx)
Returns:initialized PDB structure
Return type:PDB
get_chain(chain, model=0)[source]

Extract residue information and atom coordinates for a given chain in PDB structure

Parameters:
  • chain (str) – Name of chain to be extracted (e.g. “A”)
  • model (int, optional (default: 0)) – Index of model to be extracted
Returns:

Chain object containing DataFrames listing residues and atom coordinates

Return type:

Chain

class evcouplings.compare.pdb.PDB(mmtf)[source]

Bases: object

Wrapper around PDB MMTF decoder object to access residue and coordinate information

classmethod from_file(filename)[source]

Initialize structure from MMTF file

Parameters:filename (str) – Path of MMTF file
Returns:initialized PDB structure
Return type:PDB
classmethod from_id(pdb_id)[source]

Initialize structure by PDB ID (fetches structure from RCSB servers)

Parameters:pdb_id (str) – PDB identifier (e.g. 1hzx)
Returns:initialized PDB structure
Return type:PDB
get_chain(chain, model=0)[source]

Extract residue information and atom coordinates for a given chain in PDB structure

Parameters:
  • chain (str) – Name of chain to be extracted (e.g. “A”)
  • model (int, optional (default: 0)) – Index of model to be extracted
Returns:

Chain object containing DataFrames listing residues and atom coordinates

Return type:

Chain

evcouplings.compare.pdb.load_structures(pdb_ids, structure_dir=None, raise_missing=True)[source]

Load PDB structures from files / web

Parameters:
  • pdb_ids (Iterable) – List / iterable containing PDB identifiers to be loaded.
  • structure_dir (str, optional (default: None)) – Path to directory with structures. Structures filenames must be in the format 5p21.mmtf. If a file can not be found, will try to fetch from web instead.
  • raise_missing (bool, optional (default: True)) – Raise a ResourceError exception if any of the PDB IDs cannot be loaded. If False, missing entries will be ignored.
Returns:

structures – Dictionary containing loaded structures. Keys (PDB identifiers) will be lower-case.

Return type:

dict(str -> PDB)

Raises:

ResourceError – Raised if raise_missing is True and any of the given PDB IDs cannot be loaded.

evcouplings.compare.protocol module

evcouplings.compare.sifts module

Uniprot to PDB structure identification and index mapping using the SIFTS database (https://www.ebi.ac.uk/pdbe/docs/sifts/)

This functionality is centered around the pdb_chain_uniprot.csv table available from SIFTS. (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/flatfiles/csv/pdb_chain_uniprot.csv.gz)

Authors:
Thomas A. Hopf Anna G. Green (find_homologs) Chan Kang (find_homologs)
class evcouplings.compare.sifts.SIFTS(sifts_table_file, sequence_file=None)[source]

Bases: object

Provide Uniprot to PDB mapping data and functions starting from SIFTS mapping table.

by_alignment(min_overlap=20, reduce_chains=False, **kwargs)[source]

Find structures by sequence alignment between query sequence and sequences in PDB.

Parameters:
  • min_overlap (int, optional (default: 20)) – Require at least this many aligned positions with the target structure
  • reduce_chains (bool, optional (Default: True)) – If true, keep only first chain per PDB ID (i.e. remove redundant occurrences of same protein in PDB structures). Should be set to False to identify homomultimeric contacts.
  • **kwargs

    Defines the behaviour of find_homologs() function used to find homologs by sequence alignment: - which alignment method is used

    (pdb_alignment_method: {“jackhmmer”, “hmmsearch”}, default: “jackhmmer”),
    • parameters passed into the protocol for the selected alignment method (evcouplings.align.jackhmmer_search or evcouplings.align.hmmbuild_and_search).

      Default parameters are set in the HMMER_CONFIG string in this module, other parameters will need to be overriden; these minimally are: - for pdb_alignment_method == “jackhmmer”:

      • sequence_id : str, identifier of target sequence
      • jackhmmer : str, path to jackhmmer binary if not on path
      • for pdb_alignment_method == “hmmsearch”: - sequence_id : str, identifier of target sequence - raw_focus_alignment_file : str, path to input alignment file - hmmbuild : str, path to hmmbuild binary if not on path - hmmsearch : str, path to search binary if not on path
    • additionally, if “prefix” is given, individual mappings will be saved to files suffixed by the respective key in mapping table.
Returns:

Record of hits and mappings found for this query sequence by alignment. See by_pdb_id() for detailed explanation of fields.

Return type:

SIFTSResult

by_pdb_id(pdb_id, pdb_chain=None, uniprot_id=None)[source]

Find structures and mapping by PDB id and chain name

Parameters:
  • pdb_id (str) – 4-letter PDB identifier
  • pdb_chain (str, optional (default: None)) – PDB chain name (if not given, all chains for PDB entry will be returned)
  • uniprot_id (str, optional (default: None)) – Filter to keep only this Uniprot accession number or identifier (necessary for chimeras, or multi-chain complexes with different proteins)
Returns:

Identified hits plus index mappings to Uniprot

Return type:

SIFTSResult

Raises:

ValueError – If selected segments in PDB file do not unambigously map to one Uniprot entry

by_uniprot_id(uniprot_id, reduce_chains=False)[source]

Find structures and mapping by Uniprot access number.

Parameters:
  • uniprot_ac (str) – Find PDB structures for this Uniprot accession number. If sequence_file was given while creating the SIFTS object, Uniprot identifiers can also be used.
  • reduce_chains (bool, optional (Default: True)) – If true, keep only first chain per PDB ID (i.e. remove redundant occurrences of same protein in PDB structures). Should be set to False to identify homomultimeric contacts.
Returns:

Record of hits and mappings found for this Uniprot protein. See by_pdb_id() for detailed explanation of fields.

Return type:

SIFTSResult

create_sequence_file(output_file, chunk_size=1000, max_retries=100)[source]

Create FASTA sequence file containing all UniProt sequences of proteins in SIFTS. This file is required for homology-based structure identification and index remapping. This function will also automatically associate the sequence file with the SIFTS object.

Parameters:
  • output_file (str) – Path at which to store sequence file
  • chunk_size (int, optional (default: 1000)) – Retrieve sequences from UniProt in chunks of this size (too large chunks cause the mapping service to stall)
  • max_retries (int, optional (default: 100)) – Allow this many retries when fetching sequences from UniProt ID mapping service, which unfortunately often suffers from connection failures.
class evcouplings.compare.sifts.SIFTSResult(hits, mapping)[source]

Bases: object

Store results of SIFTS structure/mapping identification.

(Full class defined for easify modification of fields)

evcouplings.compare.sifts.fetch_uniprot_mapping(ids, from_='ACC', to='ACC', format='fasta')[source]

Fetch data from UniProt ID mapping service (e.g. download set of sequences)

Parameters:
  • ids (list(str)) – List of UniProt identifiers for which to retrieve mapping
  • from (str, optional (default: "ACC")) – Source identifier (i.e. contained in “ids” list)
  • to (str, optional (default: "ACC")) – Target identifier (to which source should be mapped)
  • format (str, optional (default: "fasta")) – Output format to request from Uniprot server
Returns:

Response from UniProt server

Return type:

str

evcouplings.compare.sifts.find_homologs(pdb_alignment_method='jackhmmer', **kwargs)[source]

Identify homologs using jackhmmer or hmmbuild/hmmsearch

Parameters:
  • pdb_alignment_method ({"jackhmmer", "hmmsearch"},) – optional (default: “jackhmmer”) Sequence alignment method used for searching the PDB
  • **kwargs – Passed into jackhmmer / hmmbuild_and_search protocol (see documentation for available options)
Returns:

  • ali (evcouplings.align.Alignment) – Alignment of homologs of query sequence in sequence database
  • hits (pandas.DataFrame) – Tabular representation of hits

evcouplings.complex package

evcouplings.complex.protocol module

Protocols for matching putatively interacting sequences in protein complexes to create a concatenated sequence alignment

Authors:
Anna G. Green Thomas A. Hopf
evcouplings.complex.protocol.best_hit(**kwargs)[source]

Protocol:

Concatenate alignments based on the best hit to the focus sequence in each species

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:

alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file

Return type:dict
evcouplings.complex.protocol.describe_concatenation(annotation_file_1, annotation_file_2, genome_location_filename_1, genome_location_filename_2, outfile)[source]

Describes properties of concatenated alignment.

Writes a csv with the following columns

num_seqs_1 : number of sequences in the first monomer alignment num_seqs_2 : number of sequences in the second monomer alignment num_nonred_species_1 : number of unique species annotations in the

first monomer alignment
num_nonred_species_2 : number of unique species annotations in the
second monomer alignment

num_species_overlap: number of unique species found in both alignments median_num_per_species_1 : median number of paralogs per species in the

first monomer alignmment
median_num_per_species_2 : median number of paralogs per species in
the second monomer alignment
num_with_embl_cds_1 : number of IDs for which we found an EMBL CDS in the
first monomer alignment (relevant to distance concatention only)
num_with_embl_cds_2 : number of IDs for which we found an EMBL CDS in the
first monomer alignment (relevant to distance concatention only)
Parameters:
  • annotation_file_1 (str) – Path to annotation.csv file for first monomer alignment
  • annotation_file_2 (str) – Path to annotation.csv file for second monomer alignment
  • genome_location_filename_1 (str) – Path to genome location mapping file for first alignment
  • genome_location_filename_2 (str) – Path to genome location mapping file for second alignment
  • outfile (str) – Path to output file
evcouplings.complex.protocol.genome_distance(**kwargs)[source]

Protocol:

Concatenate alignments based on genomic distance

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • alignment_file
  • raw_alignment_file
  • focus_mode
  • focus_sequence
  • segments
  • frequencies_file
  • identities_file
  • num_sequences
  • num_sites
  • raw_focus_alignment_file
  • statistics_file
Return type:dict
evcouplings.complex.protocol.modify_complex_segments(outcfg, **kwargs)[source]

Modifies the output configuration so that the segments are correct for a concatenated alignment

Parameters:outcfg (dict) – The output configuration
Returns:outcfg – The output configuration, with a new field called “segments”
Return type:dict
evcouplings.complex.protocol.run(**kwargs)[source]

Run alignment concatenation protocol

Parameters:kwargs arguments (Mandatory) – protocol: concatenation protocol to run prefix: Output prefix for all generated files
Returns:outcfg – Output configuration of concatenation stage Dictionary with results in following fields: (in brackets: not mandatory)

alignment_file raw_alignment_file focus_mode focus_sequence segments frequencies_file identities_file num_sequences num_sites raw_focus_alignment_file statistics_file

Return type:dict

evcouplings.couplings package

evcouplings.couplings.mapping module

Mapping indices for complexes / multi-domain sequences to internal model numbering.

Authors:
Thomas A. Hopf Anna G. Green (MultiSegmentCouplingsModel)
class evcouplings.couplings.mapping.MultiSegmentCouplingsModel(filename, *segments, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: evcouplings.couplings.model.CouplingsModel

Complex specific Couplings Model that handles segments and provides the option to convert model into inter-segment only.

to_inter_segment_model()[source]

Convert model to inter-segment only parameters, ie the J_ijs that correspond to inter-protein or inter-domain residue pairs. All other parameters are set to 0.

Returns:Copy of object turned into inter-only Epistatic model
Return type:CouplingsModel
class evcouplings.couplings.mapping.Segment(segment_type, sequence_id, region_start, region_end, positions=None, segment_id='A')[source]

Bases: object

Represents a continuous stretch of sequence in a sequence alignment to infer evolutionary couplings (e.g. multiple domains, or monomers in a concatenated complex alignment)

default_chain_name()[source]

Retrieve default PDB chain identifier the segment will be mapped to in 3D structures (by convention, segments in the pipeline are named A_1, A_2, …, B_1, B_2, …; the default chain identifier is anything before the underscore).

Returns:chain – Default PDB chain identifier the segment maps to
Return type:str
classmethod from_list(segment)[source]

Create a segment object from list representation (e.g. from config).

Parameters:segment (list) – List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Returns:New Segment instance from list
Return type:Segment
to_list()[source]

Represent segment as list (for storing in configs)

Returns:List representation of segment, with the following items: segment_id (str), segment_type (str), sequence_id (str), region_start (int), region_end (int), positions (list(int))
Return type:list
class evcouplings.couplings.mapping.SegmentIndexMapper(focus_mode, first_index, *segments)[source]

Bases: object

Map indices of one or more sequence segments into CouplingsModel internal numbering space. Can also be used to (trivially) remap indices for a single sequence.

patch_model(model, inplace=True)[source]

Change numbering of CouplingModel object so that it uses segment-based numbering

Parameters:
  • model (CouplingsModel) – Model that will be updated to segment- based numbering
  • inplace (bool, optional (default: True)) – If True, change passed model; otherwise returnnew object
Returns:

Model with updated numbering (if inplace is False, this will point to original model)

Return type:

CouplingsModel

Raises:

ValueError – If segment mapping does not match internal model numbering

to_model(x)[source]

Map target index to model index

Parameters:x ((str, int), or list of (str, int)) – Indices in target indexing (segment_id, index_in_segment)
Returns:Monomer indices mapped into couplings object numbering
Return type:int, or list of int
to_target(x)[source]

Map model index to target index

Parameters:x (int, or list of ints) – Indices in model numbering
Returns:Indices mapped into target numbering. Tuples are (segment_id, index_in_segment)
Return type:(str, int), or list of (str, int)
evcouplings.couplings.mapping.segment_map_ecs(ecs, mapper)[source]

Map EC dataframe in model numbering into segment numbering

Parameters:ecs (pandas.DataFrame) – EC table (with columns i and j)
Returns:Mapped EC table (with columns i and j mapped, and additional columns segment_i and segment_j)
Return type:pandas.DataFrame

evcouplings.couplings.mean_field module

evcouplings.couplings.model module

Class to store parameters of undirected graphical model of sequences and perform calculations using the model (statistical energies, coupling scores).

Authors:
Thomas A. Hopf
class evcouplings.couplings.model.CouplingsModel(model_file, precision='float32', file_format='plmc_v2', **kwargs)[source]

Bases: object

Class to store parameters of pairwise undirected graphical model of sequences and compute evolutionary couplings, sequence statistical energies, etc.

Jij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to J_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

classmethod apc(matrix)[source]

Apply average product correction (Dunn et al., Bioinformatics, 2008) to matrix

Parameters:matrix (np.array) – Symmetric L x L matrix which should be corrected by APC
Returns:Symmetric L x L matrix with APC correction applied
Return type:np.array
cn(i=None, j=None)[source]

Quick access to cn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

cn_scores

L x L numpy matrix with CN (corrected norm) scores

convert_sequences(sequences)[source]

Converts sequences in string format into internal symbol representation according to alphabet of model

Parameters:sequences (list of str) – List of sequences (must have same length and correspond to model states)
Returns:Matrix of size len(sequences) x L of sequences converted to integer symbols
Return type:np.array
delta_hamiltonian(substitutions, verify_mutants=True)[source]

Calculate difference in statistical energy relative to self.target_seq by changing sequence according to list of substitutions

Parameters:
  • substitutions (list of tuple(pos, subs_from, subs_to)) – Substitutions to be applied to target sequence
  • verify_mutants (bool, optional) – Test if subs_from is consistent with self.target_seq
Returns:

Vector of length 3 with 1) total delta Hamiltonian, 2) delta J_ij, 3) delta h_i

Return type:

np.array

dmm(i=None, j=None, A_i=None, A_j=None)[source]

Access delta_Hamiltonian matrix of double mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) of first substitution(s)
  • j (Iterable(int) or int) – Position(s) of second substitution(s)
  • A_i (Iterable(char) or char) – Substitution(s) to first position
  • A_j (Iterable(char) or char) – Substitution(s) to second position
Returns:

4D matrix containing energy differences for slices along both axes of double mutation matrix (axes 1/2: position, axis 3/4: substitutions).

Return type:

np.array(float)

double_mut_mat

Hamiltonian difference for all possible double mutant variants

L x L x num_symbol x num_symbol matrix containing delta Hamiltonians for all possible double mutants of target sequence

ecs

DataFrame with evolutionary couplings, sorted by CN score (all scores: CN, FN, MI)

fi(i=None, A_i=None)[source]

Quick access to f_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

fij(i=None, j=None, A_i=None, A_j=None)[source]

Quick access to f_ij matrix with automatic index mapping. See __4d_access for explanation of parameters.

fn(i=None, j=None)[source]

Quick access to fn_scores matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

fn_scores

L x L numpy matrix with FN (Frobenius norm) scores

hamiltonians(sequences)[source]

Calculates the Hamiltonians of the global probability distribution P(A_1, …, A_L) for the given sequences A_1,…,A_L from J_ij and h_i parameters

Parameters:sequences (list of str) – List of sequences for which Hamiltonian will be computed, or converted np.array obtained using convert_sequences method
Returns:Float matrix of size len(sequences) x 3, where each row corresponds to the 1) total Hamiltonian of sequence and the 2) J_ij and 3) h_i sub-sums
Return type:np.array
hi(i=None, A_i=None)[source]

Quick access to h_i matrix with automatic index mapping. See __2d_access for explanation of parameters.

index_list

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

itu(i=None)[source]

Legacy method for backwards compatibility. See self.sn for explanation.

mi_apc(i=None, j=None)[source]

Quick access to mi_scores_apc matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_raw(i=None, j=None)[source]

Quick access to mi_scores_raw matrix with automatic index mapping. See __2d_access_score_matrix for explanation of parameters.

mi_scores_apc

L x L numpy matrix with MI (mutual information) scores with APC correction

mi_scores_raw

L x L numpy matrix with MI (mutual information) scores without APC correction

mn(i=None)[source]

Map model numbering to internal numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from model numbering space into internal numbering space
Returns:Remapped position(s)
Return type:Iterable(int) or int
mui(i=None)[source]

Legacy method for backwards compatibility. See self.mn for explanation.

seq(i=None)[source]

Access target sequence of model

Parameters:i (Iterable(int) or int) – Position(s) for which symbol should be retrieved
Returns:Sequence symbols
Return type:Iterable(char) or char
single_mut_mat

Hamiltonian difference for all possible single-site variants

L x num_symbol matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence.

single_mut_mat_full

Hamiltonian difference for all possible single-site variants

L x num_symbol x 3 matrix (np.array) containing delta Hamiltonians for all possible single mutants of target sequence. Third dimension: 1) full Hamiltonian, 2) J_ij, 3) h_i

smm(i=None, A_i=None)[source]

Access delta_Hamiltonian matrix of single mutants of target sequence

Parameters:
  • i (Iterable(int) or int) – Position(s) for which energy difference should be retrieved
  • A_i (Iterable(char) or char) – Substitutions for which energy difference should be retrieved
Returns:

2D matrix containing energy differences for slices along both axes of single mutation matrix (first axis: position, second axis: substitution).

Return type:

np.array(float)

sn(i=None)[source]

Map internal numbering to sequence numbering

Parameters:i (Iterable(int) or int) – Position(s) to be mapped from internal numbering space into sequence numbering space.
Returns:Remapped position(s)
Return type:Iterable(int) or int
target_seq

Target/Focus sequence of model used for delta_hamiltonian calculations (including single and double mutation matrices)

to_file(out_file, precision='float32', file_format='plmc_v2')[source]

Writes the potentially modified model again to binary file

Parameters:
  • out_file (str) – A string specifying the path to a file
  • precision ({"float16", "float32", "float64"}, optional (default: "float32")) – Numerical NumPy data type specifying the precision used to write numerical values to file
  • file_format ({"plmc_v1", "plmc_v2"}, optional (default: "plmc_v2")) – Available file formats
to_independent_model()[source]

Estimate parameters of a single-site model using Gaussian prior/L2 regularization.

Returns:Copy of object turned into independent model
Return type:CouplingsModel

evcouplings.couplings.pairs module

evcouplings.couplings.protocol module

evcouplings.couplings.tools module

evcouplings.mutate package

evcouplings.mutate.calculations module

High-level mutation calculation functions for EVmutation

Todo

implement segment handling

Authors:
Thomas A. Hopf Anna G. Green (generalization for multiple segments)
evcouplings.mutate.calculations.extract_mutations(mutation_string, offset=0, sep=', ')[source]

Turns a string containing mutations of the format I100V into a list of tuples with format (100, ‘I’, ‘V’) (index, from, to)

Parameters:
  • mutation_string (str) – Comma-separated list of one or more mutations (e.g. “K50R,I100V”)
  • offset (int, default: 0) – Offset to be added to the index/position of each mutation
  • sep (str, default ",") – String used to separate multiple mutations
Returns:

List of tuples of the form (index+offset, from, to)

Return type:

list of tuples

evcouplings.mutate.calculations.predict_mutation_table(model, table, output_column='prediction_epistatic', mutant_column='mutant', hamiltonian='full', segment=None)[source]

Predicts all mutants in a dataframe and adds predictions as a new column.

If mutant_column is None, the dataframe index is used, otherwise the given column.

Mutations which cannot be calculated (e.g. not covered by alignment, or invalid substitution) using object are set to NaN.

Parameters:
  • model (CouplingsModel) – CouplingsModel instance used to compute mutation effects
  • table (pandas.DataFrame) – DataFrame with mutants to which delta of statistical energy will be added
  • mutant_column (str) – Name of column in table that contains mutants
  • output_column (str) – Name of column in returned dataframe that will contain computed effects
  • hamiltonian ({"full", "couplings", "fields"},) – default: “full” Use full Hamiltonian of exponential model (default), or only couplings / fields for statistical energy calculation.
  • segment (str, default: None) – Specificy a segment identifier to use for the positions in the mutation table. This will only be used if the mutation table doesn’t already have a segments column.
Returns:

Dataframe with added column (mutant_column) that contains computed mutation effects

Return type:

pandas.DataFrame

evcouplings.mutate.calculations.single_mutant_matrix(model, output_column='prediction_epistatic', exclude_self_subs=True)[source]

Create table with all possible single substitutions of target sequence in CouplingsModel object.

Parameters:
  • model (CouplingsModel) – Model that will be used to predict single mutants
  • output_column (str, default: "prediction_epistatic") – Name of column in Dataframe that will contain predictions
  • exclude_self_subs (bool, default: True) – Exclude self-substitutions (e.g. A100A) from results
Returns:

DataFrame with predictions for all single mutants

Return type:

pandas.DataFrame

evcouplings.mutate.calculations.split_mutants(x, mutant_column='mutant')[source]

Splits mutation strings into individual columns in DataFrame (wild-type symbol(s), position(s), substitution(s), number of mutations). This function is e.g. helpful when computing average effects per position using pandas groupby() operations

Parameters:
  • x (pandas.DataFrame) – Table with mutants
  • mutant_column (str, default: "mutant") – Column which contains mutants, set to None to use index of DataFrame
Returns:

DataFrame with added columns “num_subs”, “pos”, “wt” and “subs” that contain the number of mutations, and split mutation strings (if higher-order mutations, symbols/numbers are comma-separated)

Return type:

pandas.DataFrame

evcouplings.mutate.protocol module

Sequence statistical energy and mutation effect computation protocols

Authors:
Thomas A. Hopf Anna G. Green (complex)
evcouplings.mutate.protocol.complex(**kwargs)[source]

Protocol: Mutation effect prediction and visualization for protein complexes

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • mutation_matrix_file
  • [mutation_dataset_predicted_file]
Return type:dict
evcouplings.mutate.protocol.run(**kwargs)[source]

Run mutation protocol

Parameters:kwargs arguments (Mandatory) – protocol: EC protocol to run prefix: Output prefix for all generated files
Returns:outcfg – Output configuration of stage (see individual protocol for fields)
Return type:dict
evcouplings.mutate.protocol.standard(**kwargs)[source]

Protocol: Mutation effect calculation and visualization for protein monomers

TODO: eventually merge with complexes to make a protocol agnostic to the number of segments

Parameters:kwargs arguments (Mandatory) – See list below in code where calling check_required
Returns:outcfg – Output configuration of the pipeline, including the following fields:
  • mutation_matrix_file
  • [mutation_dataset_predicted_file]
Return type:dict

Folding Analysis

evcouplings.fold package

evcouplings.fold.cns module

evcouplings.fold.filter module

Functions for detecting ECs that should not be included in 3D structure prediction

Most functions in this module are rewritten from older pipeline code in choose_CNS_constraint_set.m

Authors:
Thomas A. Hopf
evcouplings.fold.filter.detect_secstruct_clash(i, j, secstruct)[source]

Detect if an EC pair (i, j) is geometrically impossible given a predicted secondary structure

Based on direct port of the logic implemented in choose_CNS_constraint_set.m from original pipeline, lines 351-407.

Use secstruct_clashes() to annotate an entire table of ECs.

Parameters:
  • i (int) – Index of first position
  • j (int) – Index of second position
  • secstruct (dict) – Mapping from position (int) to secondary structure (“H”, “E”, “C”)
Returns:

clashes – True if (i, j) clashes with secondary structure

Return type:

bool

evcouplings.fold.filter.disulfide_clashes(ec_pairs, output_column='cys_clash')[source]

Add disulfide bridge clashes to EC table (i.e. if any cysteine residue is coupled to another cysteine). This flag is necessary if disulfide bridges are created during folding, since only one bridge is possible per cysteine.

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for the occurrence of multiple cys-cys pairings (with columns i, j, A_i, A_j)
  • output_column (str, optional (default: "cys_clash")) – Target column indicating if pair is in a clash or not
Returns:

Annotated EC table with clashes

Return type:

pandas.DataFrame

evcouplings.fold.filter.secstruct_clashes(ec_pairs, residues, output_column='ss_clash', secstruct_column='sec_struct_3state')[source]

Add secondary structure clashes to EC table

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for clashes with secondary structure (with columns i, j)
  • residues (pandas.DataFrame) – Table with residues in sequence and their secondary structure (columns i, ss_pred).
  • output_column (str, optional (default: "secstruct_clash")) – Target column indicating if pair is in a clash or not
  • secstruct_column (str, optional (default: "sec_struct_3state")) – Source column in ec_pairs with secondary structure states (H, E, C)
Returns:

Annotated EC table with clashes

Return type:

pandas.DataFrame

evcouplings.fold.protocol module

evcouplings.fold.ranking module

evcouplings.fold.restraints module

Functions for generating distance restraints from evolutionary couplings and secondary structure predictions

Authors:
Thomas A. Hopf Anna G. Green (docking restraints)
evcouplings.fold.restraints.docking_restraints(ec_pairs, output_file, restraint_formatter, config_file=None)[source]

Create .tbl file with distance restraints for docking

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j, segment_i, segment_j)
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
evcouplings.fold.restraints.ec_dist_restraints(ec_pairs, output_file, restraint_formatter, config_file=None)[source]

Create .tbl file with distance restraints based on evolutionary couplings

Logic based on choose_CNS_constraint_set.m, lines 449-515

Parameters:
  • ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j)
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
evcouplings.fold.restraints.secstruct_angle_restraints(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]

Create .tbl file with dihedral angle restraints based on secondary structure prediction

Logic based on make_cns_angle_constraints.pl

Parameters:
  • residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function, optional) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
  • secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).
evcouplings.fold.restraints.secstruct_dist_restraints(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]

Create .tbl file with distance restraints based on secondary structure prediction

Logic based on choose_CNS_constraint_set.m, lines 519-1162

Parameters:
  • residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
  • output_file (str) – Path to file in which restraints will be saved
  • restraint_formatter (function) – Function called to create string representation of restraint
  • config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
  • secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).

evcouplings.fold.tools module

Wrappers for tools for 3D structure prediction from evolutionary couplings

Authors:
Thomas A. Hopf
evcouplings.fold.tools.parse_maxcluster_clustering(clustering_output)[source]

Parse maxcluster clustering output into a DataFrame

Parameters:clustering_output (str) – stdout output from maxcluster after clustering
Returns:Parsed result table (columns: filename, cluster, cluster_size)
Return type:pandas.DataFrame
evcouplings.fold.tools.parse_maxcluster_comparison(comparison_output)[source]

Parse maxcluster output into a DataFrame

Parameters:comparison_output (str) – stdout output from maxcluster after comparison
Returns:Parsed result table (columns: filename, num_pairs, rmsd, maxsub, tm, msi), refer to maxcluster documentation for explanation of the score fields.
Return type:pandas.DataFrame
evcouplings.fold.tools.read_psipred_prediction(filename, first_index=1)[source]

Read a psipred secondary structure prediction file in horizontal or vertical format (auto-detected).

Parameters:
  • filename (str) – Path to prediction output file
  • first_index (int, optional (default: 1)) – Index of first position in predicted sequence
Returns:

pred – Table containing secondary structure prediction, with the following columns:

  • i: position
  • A_i: amino acid
  • sec_struct_3state: prediction (H, E, C)

If reading vformat, also contains columns for the individual (score_coil/helix/strand)

If reading hformat, also contains confidence score between 1 and 9 (sec_struct_conf)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_cns(inp_script=None, inp_file=None, log_file=None, binary='cns')[source]

Run CNSsolve 1.21 (without worrying about environment setup)

Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.

Either input_script or input_file has to be specified.

Parameters:
  • inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
  • inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
  • log_file (str, optional (default: None)) – Save CNS stdout output to this file
  • binary (str, optional (default: "cns")) – Absolute path of CNS binary
Raises:
  • ExternalToolError – If call to CNS fails
  • InvalidParameterError – If no input script (file or string) given
evcouplings.fold.tools.run_cns_13(inp_script=None, inp_file=None, log_file=None, source_script=None, binary='cns')[source]

Run CNSsolve 1.3

Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.

Either input_script or input_file has to be specified.

Parameters:
  • inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
  • inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
  • log_file (str, optional (default: None)) – Save CNS stdout output to this file
  • source_script (str, optional (default: None)) – Script to set CNS environment variables. This should typically point to .cns_solve_env_sh in the CNS installation main directory (the shell script itself needs to be edited to contain the path of the installation)
  • binary (str, optional (default: "cns")) – Name of CNS binary
Raises:
  • ExternalToolError – If call to CNS fails
  • InvalidParameterError – If no input script (file or string) given
evcouplings.fold.tools.run_maxcluster_cluster(predictions, method='average', rmsd=True, clustering_threshold=None, binary='maxcluster')[source]

Compare a set of predicted structures to an experimental structure using maxcluster.

For clustering functionality, use run_maxcluster_clustering() function.

Parameters:
  • predictions (list(str)) – List of PDB files that should be compared against experiment
  • method ({"single", "average", "maximum", "pairs_min", "pairs_abs"}, optional (default: "average")) – Clustering method (single / average / maximum linkage, or min / absolute size neighbour pairs
  • clustering_threshold (float (optional, default: None)) – Initial clustering threshold (maxcluster -T option)
  • rmsd (bool, optional (default: True)) – Use RMSD-based clustering (faster)
  • binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns:

Clustering result table (see parse_maxcluster_clustering for more detailed explanation)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_maxcluster_compare(predictions, experiment, normalization_length=None, distance_cutoff=None, binary='maxcluster')[source]

Compare a set of predicted structures to an experimental structure using maxcluster.

For clustering functionality, use run_maxcluster_clustering() function.

For a high-level wrapper around this function that removes problematic atoms and compares multiple models, please look at evcouplings.fold.protocol.compare_models_maxcluster().

Parameters:
  • predictions (list(str)) – List of PDB files that should be compared against experiment
  • experiment (str) – Path of experimental structure PDB file. Note that the numbering and residues in this file must agree with the predicted structure, and that the structure may not contain duplicate atoms (multiple models, or alternative locations for the same atom).
  • normalization_length (int, optional (default: None)) – Use this length to normalize the Template Modeling (TM) score (-N option of maxcluster). If None, will normalize by length of experiment.
  • distance_cutoff (float, optional (default: None)) – Distance cutoff for MaxSub search (-d option of maxcluster). If None, will use maxcluster auto-calibration.
  • binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns:

Comparison result table (see parse_maxcluster_comparison for more detailed explanation)

Return type:

pandas.DataFrame

evcouplings.fold.tools.run_psipred(fasta_file, output_dir, binary='runpsipred')[source]

Run psipred secondary structure prediction

psipred output file convention: run_psipred creates output files <rootname>.ss2 and <rootname2>.horiz in the current working directory, where <rootname> is extracted from the basename of the input file (e.g. /home/test/<rootname>.fa)

Parameters:
  • fasta_file (str) – Input sequence file in FASTA format
  • output_dir (str) – Directory in which output will be saved
  • binary (str, optional (default: "cns")) – Path of psipred executable (runpsipred)
Returns:

  • ss2_file (str) – Absolute path to prediction output in “VFORMAT”
  • horiz_file (str) – Absolute path to prediction output in “HFORMAT”

Raises:

ExternalToolError – If call to psipred fails

Visualization

evcouplings.visualize package

evcouplings.visualize.misc module

evcouplings.visualize.mutations module

evcouplings.visualize.pairs module

evcouplings.visualize.parameters module

evcouplings.visualize.pymol module

Utilities

evcouplings.utils package

evcouplings.utils.app module

evcouplings.utils.batch module

Looping through batches of jobs (former submit_job.py and buildali loop)

Authors:
Benjamin Schubert, Thomas A. Hopf
class evcouplings.utils.batch.AClusterSubmitter[source]

Bases: evcouplings.utils.batch.ASubmitter

Abstract subclass of a cluster submitter

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
job_id_pattern
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
monitor_command
resource_flags
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

submit_command()[source]
class evcouplings.utils.batch.APluginRegister(name, bases, nmspc)[source]

Bases: abc.ABCMeta

This class allows automatic registration of new plugins.

class evcouplings.utils.batch.ASubmitter[source]

Bases: object

Interface for all submitters

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
name

The name of the submitter

Returns:The name of the submitter
Return type:str
registry = {'local': <class 'evcouplings.utils.batch.LocalSubmitter'>, 'lsf': <class 'evcouplings.utils.batch.LSFSubmitter'>, 'sge': <class 'evcouplings.utils.batch.SGESubmitter'>, 'slurm': <class 'evcouplings.utils.batch.SlurmSubmitter'>}
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

class evcouplings.utils.batch.Command(command, name=None, environment=None, workdir=None, resources=None)[source]

Bases: object

Wrapper around the command parameters needed to execute a script

class evcouplings.utils.batch.EJob[source]

Bases: enum.Enum

An enumeration.

CANCEL = 2
MONITOR = 1
PID = 5
STOP = 3
SUBMIT = 0
UPDATE = 4
evcouplings.utils.batch.EResource

alias of evcouplings.utils.batch.Enum

evcouplings.utils.batch.EStatus

alias of evcouplings.utils.batch.Enum

class evcouplings.utils.batch.LSFSubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command
class evcouplings.utils.batch.LocalSubmitter(blocking=True, db_path=None, ncpu=1)[source]

Bases: evcouplings.utils.batch.ASubmitter

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
name

The name of the submitter

Returns:The name of the submitter
Return type:str
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

class evcouplings.utils.batch.SGESubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command
class evcouplings.utils.batch.SlurmSubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command

evcouplings.utils.calculations module

General calculation functions.

Authors:
Thomas A. Hopf
evcouplings.utils.calculations.dihedral_angle(p0, p1, p2, p3)[source]

Compute dihedral angle given four points

Adapted from the following source: http://stackoverflow.com/questions/20305272/dihedral-torsion-angle-from-four-points-in-cartesian-coordinates-in-python (answer by user Praxeolitic)

Parameters:
  • p0 (np.array) – Coordinates of first point
  • p1 (np.array) – Coordinates of second point
  • p2 (np.array) – Coordinates of third point
  • p3 (np.array) – Coordinates of fourth point
Returns:

Dihedral angle (in radians)

Return type:

numpy.float

evcouplings.utils.calculations.entropy(X, normalize=False)[source]

Calculate entropy of distribution

Parameters:
  • X (np.array) – Vector for which entropy will be calculated
  • normalize – Rescale entropy to range from 0 (“variable”, “flat”) to 1 (“conserved”)
Returns:

Entropy of X

Return type:

float

evcouplings.utils.calculations.entropy_map(model, normalize=True)[source]

Compute dictionary of positional entropies for single-site frequencies in a CouplingsModel

Parameters:
  • model (CouplingsModel) – Model for which entropy of sequence alignment will be computed (based on single-site frequencies f_i(A_i) contained in model)
  • normalize (bool, default: True) – Normalize entropy to range 0 (variable) to 1 (conserved) instead of raw values
Returns:

Map from positions in sequence (int) to entropy of column (float) in alignment

Return type:

dict

evcouplings.utils.calculations.entropy_vector(model, normalize=True)[source]

Compute vector of positional entropies for single-site frequencies in a CouplingsModel

Parameters:
  • model (CouplingsModel) – Model for which entropy of sequence alignment will be computed (based on single-site frequencies f_i(A_i) contained in model)
  • normalize (bool, default: True) – Normalize entropy to range 0 (variable) to 1 (conserved) instead of raw values
Returns:

Vector of length model.L containing entropy for each position

Return type:

np.array

evcouplings.utils.calculations.median_absolute_deviation(x, scale=1.4826)[source]

Compute median absolute deviation of a set of numbers (median of deviations from median)

Parameters:
  • x (list-like of float) – Numbers for which median absolute deviation will be computed
  • scale (float, optional (default: 1.4826)) – Rescale median absolute deviation by this factor; default value is such that median absolute deviation will match regular standard deviation of Gaussian distribution

evcouplings.utils.config module

Configuration handling

Todo

switch ruamel.yaml to round trip loading to preserver order and comments?

Authors:
Thomas A. Hopf
exception evcouplings.utils.config.InvalidParameterError[source]

Bases: Exception

Exception for invalid parameter settings

exception evcouplings.utils.config.MissingParameterError[source]

Bases: Exception

Exception for missing parameters

evcouplings.utils.config.check_required(params, keys)[source]

Verify if required set of parameters is present in configuration

Parameters:
  • params (dict) – Dictionary with parameters
  • keys (list-like) – Set of parameters that has to be present in params
Raises:

MissingParameterError

evcouplings.utils.config.iterate_files(outcfg, subset=None)[source]

Generator function to iterate a list of file items in an outconfig

Parameters:
  • outcfg (dict(str)) – Configuration to extract file items for iteration from
  • subset (list(str)) – List of keys in outcfg to restrict iteration to
Returns:

Generator over tuples (file path, entry key, index). index will be None if this is a single file entry (i.e. ending with _file rather than _files).

Return type:

tuple(str, str, int)

evcouplings.utils.config.parse_config(config_str, preserve_order=False)[source]

Parse a configuration string

Parameters:
  • config_str (str) – Configuration to be parsed
  • preserve_order (bool, optional (default: True)) – Preserve formatting of input configuration string
Returns:

Configuration dictionary

Return type:

dict

evcouplings.utils.config.read_config_file(filename, preserve_order=False)[source]

Read and parse a configuration file.

Parameters:filename (str) – Path of configuration file
Returns:Configuration dictionary
Return type:dict
evcouplings.utils.config.write_config_file(out_filename, config)[source]

Save configuration data structure in YAML file.

Parameters:
  • out_filename (str) – Filename of output file
  • config (dict) – Config data that will be written to file

evcouplings.utils.constants module

Useful values and constants for all of package

Authors:
Thomas A. Hopf

evcouplings.utils.database module

evcouplings.utils.helpers module

Useful Python helpers

Authors:
Thomas A. Hopf, Benjamin Schubert
class evcouplings.utils.helpers.DefaultOrderedDict(default_factory=None, **kwargs)[source]

Bases: collections.OrderedDict

Source: http://stackoverflow.com/questions/36727877/inheriting-from-defaultddict-and-ordereddict Answer by http://stackoverflow.com/users/3555845/daniel

Maybe this one would be better? http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python

class evcouplings.utils.helpers.PersistentDict(filename, flag='c', mode=None, format='json', *args, **kwds)[source]

Bases: dict

Persistent dictionary with an API compatible with shelve and anydbm.

The dict is kept in memory, so the dictionary operations run as fast as a regular dictionary.

Write to disk is delayed until close or sync (similar to gdbm’s fast mode).

Input file format is automatically discovered. Output file format is selectable between pickle, json, and csv. All three serialization formats are backed by fast C implementations.

https://code.activestate.com/recipes/576642/

close()[source]
dump(fileobj)[source]
load(fileobj)[source]
sync()[source]

Write dict to disk

class evcouplings.utils.helpers.Progressbar(total_size, bar_length=60)[source]

Bases: object

Progress bar for command line programs

Parameters:
  • total_size (int) – The total size of the iteration
  • bar_length (int) – The visual bar length that gets printed on stdout
update(chunk)[source]

Updates and prints the progress of the progressbar

Parameters:chunk (int) – The size of the elements that are processed in the current iteration
evcouplings.utils.helpers.find_segments(data)[source]

Find consecutive number segments, based on Python 2.7 itertools recipe

Parameters:data (iterable) – Iterable in which to look for consecutive number segments (has to be in order)
evcouplings.utils.helpers.range_overlap(a, b)[source]
Source: http://stackoverflow.com/questions/2953967/
built-in-function-for-computing-overlap-in-python

Function assumes that start < end for a and b

Note

Ends of range are not inclusive

Parameters:
  • a (tuple(int, int)) – Start and end of first range (end of range is not inclusive)
  • b (tuple(int, int)) – Start and end of second range (end of range is not inclusive)
Returns:

Length of overlap between ranges a and b

Return type:

int

evcouplings.utils.helpers.render_template(template_file, mapping)[source]

Render a template using jinja2 and substitute values from mapping

Parameters:
  • template_file (str) – Path to jinja2 template
  • mapping (dict) – Mapping used to substitute values in the template
Returns:

Rendered template

Return type:

str

evcouplings.utils.helpers.retry(func, retry_max_number=None, retry_wait=None, exceptions=None, retry_action=None, fail_action=None)[source]

Retry to execute a function as often as requested

Parameters:
  • func (callable) – Function to be executed until succcessful
  • retry_max_number (int, optional (default: None)) – Maximum number of retries. If None, will retry forever.
  • retry_wait (int, optional (default: None)) – Number of seconds to wait before attempting retry
  • exceptions (exception or tuple(exception)) – Single or tuple of exceptions to catch for retrying (any other exception will cause immediate fail)
  • retry_action (callable) – Function to execute upon a retry
  • fail_action – Function to execute upon final failure
evcouplings.utils.helpers.wrap(text, width=80)[source]

Wraps a string at a fixed width.

Parameters:
  • text (str) – Text to be wrapped
  • width (int) – Line width
Returns:

Wrapped string

Return type:

str

evcouplings.utils.pipeline module

evcouplings.utils.summarize module

evcouplings.utils.system module

System-level calls to external tools, directory creation, etc.

Authors:
Thomas A. Hopf
exception evcouplings.utils.system.ExternalToolError[source]

Bases: Exception

Exception for failing external calculations

exception evcouplings.utils.system.ResourceError[source]

Bases: Exception

Exception for missing resources (files, URLs, …)

evcouplings.utils.system.create_prefix_folders(prefix)[source]

Create a directory tree contained in a prefix.

prefix : str
Prefix containing directory tree
evcouplings.utils.system.get(url, output_path=None, allow_redirects=False)[source]

Download external resource

Parameters:
  • url (str) – URL of resource that should be downloaded
  • output_path (str, optional) – Save contents of URL to this file (only for text files)
  • allow_redirects (bool) – Allow redirects by server or not
Returns:

r – Response object, use r.text to access text, r.json() to decode json, and r.content for raw bytestring

Return type:

requests.models.Response

Raises:

ResourceError

evcouplings.utils.system.get_urllib(url, output_path)[source]

Download external resource to file using urllib. This function is intended for cases where get() implemented using requests can not be used, e.g. for download from an FTP server.

Parameters:
  • url (str) – URL of resource that should be downloaded
  • output_path (str, optional) – Save contents of URL to this file (only for text files)
evcouplings.utils.system.insert_dir(prefix, *dirs, rootname_subdir=True)[source]

Create new path by inserting additional directories into the folder tree of prefix (but keeping the filename prefix at the end),

Parameters:
  • prefix (str) – Prefix of path that should be extended
  • *dirs (str) – Add these directories at the end of path
  • rootname_subdir (bool, optional (default: True)) –

    Given /my/path/prefix,

    • if True, creates structure like /my/path/prefix/*dirs/prefix
    • if False, creates structure like /my/path/*dirs/prefix
Returns:

Extended path

Return type:

str

evcouplings.utils.system.makedirs(directories)[source]

Create directory subtree, some or all of the folders may already exist.

Parameters:directories (str) – Directory subtree to create
evcouplings.utils.system.run(cmd, stdin=None, check_returncode=True, working_dir=None, shell=False, env=None)[source]

Run external program as subprocess.

Parameters:
  • cmd (str or list of str) – Command (and optional command line arguments)
  • stdin (str or byte sequence, optional (default: None)) – Input to be sent to STDIN of the process
  • check_returncode (bool, optional (default=True)) – Verify if call had returncode == 0, otherwise raise ExternalToolError
  • working_dir (str, optional (default: None)) – Change to this directory before running command
  • shell (bool, optional (default: False)) – Invoke shell when calling subprocess (default: False)
  • env (dict, optional (default: None)) – Use this environment for executing the subprocess
Returns:

  • int – Return code of process
  • stdout – Byte string with stdout output
  • stderr – Byte string of stderr output

Raises:

ExternalToolError

evcouplings.utils.system.temp()[source]

Create a temporary file

Returns:Path of temporary file
Return type:str
evcouplings.utils.system.tempdir()[source]

Create a temporary directory

Returns:Path of temporary directory
Return type:str
evcouplings.utils.system.valid_file(file_path)[source]

Verify if a file exists and is not empty.

Parameters:file_path (str) – Path to file to check
Returns:True if file exists and is non-zero size, False otherwise.
Return type:bool
evcouplings.utils.system.verify_resources(message, *args)[source]

Verify if a set of files exists and is not empty.

Parameters:
  • message (str) – Message to display with raised ResourceError
  • *args (List of str) – Path(s) of file(s) to be checked
Raises:

ResourceError – If any of the resources does not exist or is empty

evcouplings.utils.system.write_file(file_path, content)[source]

Writes content to output file

Parameters:
  • file_path (str) – Path of output file
  • content (str) – Content to be written to file

evcouplings.utils.update_database module

command-line app to update the necessary databases

Authors:
Benjamin Schubert
evcouplings.utils.update_database.download_ftp_file(ftp_url, ftp_cwd, file_url, output_path, file_handling='wb', gziped=False, verbose=False)[source]

Downloads a gzip file from a remote ftp server and decompresses it on the fly into an output file

Parameters:
  • ftp_url (str) – the FTP server url
  • ftp_cwd (str) – the FTP directory of the file to download
  • file_url (str) – the file name that gets downloaded
  • output_path (str) – the path to the output file on the local system
  • file_handling (str) – the file handling mode (default: ‘wb’)
  • verbose (bool) – determines whether a progressbar is printed
evcouplings.utils.update_database.run(**kwargs)[source]

Exposes command line interface as a Python function.

Parameters:kwargs – See click.option decorators for app() function

Creates or overwrites an existing symlink

Parameters:
  • target (str) – the target file path
  • link_name (str) – the symlink name

Indices and tables