evcouplings.utils package

evcouplings.utils.app module

evcouplings.utils.batch module

Looping through batches of jobs (former submit_job.py and buildali loop)

Authors:
Benjamin Schubert, Thomas A. Hopf
class evcouplings.utils.batch.AClusterSubmitter[source]

Bases: evcouplings.utils.batch.ASubmitter

Abstract subclass of a cluster submitter

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
job_id_pattern
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
monitor_command
resource_flags
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

submit_command()[source]
class evcouplings.utils.batch.APluginRegister(name, bases, nmspc)[source]

Bases: abc.ABCMeta

This class allows automatic registration of new plugins.

class evcouplings.utils.batch.ASubmitter[source]

Bases: object

Interface for all submitters

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
name

The name of the submitter

Returns:The name of the submitter
Return type:str
registry = {'local': <class 'evcouplings.utils.batch.LocalSubmitter'>, 'lsf': <class 'evcouplings.utils.batch.LSFSubmitter'>, 'sge': <class 'evcouplings.utils.batch.SGESubmitter'>, 'slurm': <class 'evcouplings.utils.batch.SlurmSubmitter'>}
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

class evcouplings.utils.batch.Command(command, name=None, environment=None, workdir=None, resources=None)[source]

Bases: object

Wrapper around the command parameters needed to execute a script

class evcouplings.utils.batch.EJob[source]

Bases: enum.Enum

An enumeration.

CANCEL = 2
MONITOR = 1
PID = 5
STOP = 3
SUBMIT = 0
UPDATE = 4
evcouplings.utils.batch.EResource

alias of evcouplings.utils.batch.Enum

evcouplings.utils.batch.EStatus

alias of evcouplings.utils.batch.Enum

class evcouplings.utils.batch.LSFSubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command
class evcouplings.utils.batch.LocalSubmitter(blocking=True, db_path=None, ncpu=1)[source]

Bases: evcouplings.utils.batch.ASubmitter

cancel(command)[source]

Consumes a list of jobIDs and trys to cancel them

Parameters:command (Command) – The Command jobejct to cancel
Returns:If job was canceled
Return type:bool
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
join()[source]

Blocks script if so desired until all jobs have be finished canceled or died

monitor(command)[source]

Returns the status of the consumed command

Parameters:command (Command) – The command object whose status is inquired
Returns:The status of the Command
Return type:Enum(Status)
name

The name of the submitter

Returns:The name of the submitter
Return type:str
submit(command, dependent=None)[source]

Consumes job objects and starts them

Parameters:
  • jobs (Command) – A list of Job objects that should be submitted
  • dependent (list(Command)) – A list of command objects the Command depend on
Returns:

A list of jobIDs

Return type:

list(str)

class evcouplings.utils.batch.SGESubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command
class evcouplings.utils.batch.SlurmSubmitter(blocking=False, db_path=None)[source]

Bases: evcouplings.utils.batch.AClusterSubmitter

Implements an LSF submitter

cancel_command
db

The persistent DB to keep track of all submitted jobs and their status

Returns:The Persistent DB
Return type:PersistentDict
isBlocking

Indicator whether the submitter is blocking or not

Returns:whether submitter blocks by calling join or not
Return type:bool
job_id_pattern
monitor_command
name

The name of the submitter

Returns:The name of the submitter
Return type:str
resource_flags
submit_command

evcouplings.utils.calculations module

General calculation functions.

Authors:
Thomas A. Hopf
evcouplings.utils.calculations.dihedral_angle(p0, p1, p2, p3)[source]

Compute dihedral angle given four points

Adapted from the following source: http://stackoverflow.com/questions/20305272/dihedral-torsion-angle-from-four-points-in-cartesian-coordinates-in-python (answer by user Praxeolitic)

Parameters:
  • p0 (np.array) – Coordinates of first point
  • p1 (np.array) – Coordinates of second point
  • p2 (np.array) – Coordinates of third point
  • p3 (np.array) – Coordinates of fourth point
Returns:

Dihedral angle (in radians)

Return type:

numpy.float

evcouplings.utils.calculations.entropy(X, normalize=False)[source]

Calculate entropy of distribution

Parameters:
  • X (np.array) – Vector for which entropy will be calculated
  • normalize – Rescale entropy to range from 0 (“variable”, “flat”) to 1 (“conserved”)
Returns:

Entropy of X

Return type:

float

evcouplings.utils.calculations.entropy_map(model, normalize=True)[source]

Compute dictionary of positional entropies for single-site frequencies in a CouplingsModel

Parameters:
  • model (CouplingsModel) – Model for which entropy of sequence alignment will be computed (based on single-site frequencies f_i(A_i) contained in model)
  • normalize (bool, default: True) – Normalize entropy to range 0 (variable) to 1 (conserved) instead of raw values
Returns:

Map from positions in sequence (int) to entropy of column (float) in alignment

Return type:

dict

evcouplings.utils.calculations.entropy_vector(model, normalize=True)[source]

Compute vector of positional entropies for single-site frequencies in a CouplingsModel

Parameters:
  • model (CouplingsModel) – Model for which entropy of sequence alignment will be computed (based on single-site frequencies f_i(A_i) contained in model)
  • normalize (bool, default: True) – Normalize entropy to range 0 (variable) to 1 (conserved) instead of raw values
Returns:

Vector of length model.L containing entropy for each position

Return type:

np.array

evcouplings.utils.calculations.median_absolute_deviation(x, scale=1.4826)[source]

Compute median absolute deviation of a set of numbers (median of deviations from median)

Parameters:
  • x (list-like of float) – Numbers for which median absolute deviation will be computed
  • scale (float, optional (default: 1.4826)) – Rescale median absolute deviation by this factor; default value is such that median absolute deviation will match regular standard deviation of Gaussian distribution

evcouplings.utils.config module

Configuration handling

Todo

switch ruamel.yaml to round trip loading to preserver order and comments?

Authors:
Thomas A. Hopf
exception evcouplings.utils.config.InvalidParameterError[source]

Bases: Exception

Exception for invalid parameter settings

exception evcouplings.utils.config.MissingParameterError[source]

Bases: Exception

Exception for missing parameters

evcouplings.utils.config.check_required(params, keys)[source]

Verify if required set of parameters is present in configuration

Parameters:
  • params (dict) – Dictionary with parameters
  • keys (list-like) – Set of parameters that has to be present in params
Raises:

MissingParameterError

evcouplings.utils.config.iterate_files(outcfg, subset=None)[source]

Generator function to iterate a list of file items in an outconfig

Parameters:
  • outcfg (dict(str)) – Configuration to extract file items for iteration from
  • subset (list(str)) – List of keys in outcfg to restrict iteration to
Returns:

Generator over tuples (file path, entry key, index). index will be None if this is a single file entry (i.e. ending with _file rather than _files).

Return type:

tuple(str, str, int)

evcouplings.utils.config.parse_config(config_str, preserve_order=False)[source]

Parse a configuration string

Parameters:
  • config_str (str) – Configuration to be parsed
  • preserve_order (bool, optional (default: True)) – Preserve formatting of input configuration string
Returns:

Configuration dictionary

Return type:

dict

evcouplings.utils.config.read_config_file(filename, preserve_order=False)[source]

Read and parse a configuration file.

Parameters:filename (str) – Path of configuration file
Returns:Configuration dictionary
Return type:dict
evcouplings.utils.config.write_config_file(out_filename, config)[source]

Save configuration data structure in YAML file.

Parameters:
  • out_filename (str) – Filename of output file
  • config (dict) – Config data that will be written to file

evcouplings.utils.constants module

Useful values and constants for all of package

Authors:
Thomas A. Hopf

evcouplings.utils.database module

evcouplings.utils.helpers module

Useful Python helpers

Authors:
Thomas A. Hopf, Benjamin Schubert
class evcouplings.utils.helpers.DefaultOrderedDict(default_factory=None, **kwargs)[source]

Bases: collections.OrderedDict

Source: http://stackoverflow.com/questions/36727877/inheriting-from-defaultddict-and-ordereddict Answer by http://stackoverflow.com/users/3555845/daniel

Maybe this one would be better? http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python

class evcouplings.utils.helpers.PersistentDict(filename, flag='c', mode=None, format='json', *args, **kwds)[source]

Bases: dict

Persistent dictionary with an API compatible with shelve and anydbm.

The dict is kept in memory, so the dictionary operations run as fast as a regular dictionary.

Write to disk is delayed until close or sync (similar to gdbm’s fast mode).

Input file format is automatically discovered. Output file format is selectable between pickle, json, and csv. All three serialization formats are backed by fast C implementations.

https://code.activestate.com/recipes/576642/

close()[source]
dump(fileobj)[source]
load(fileobj)[source]
sync()[source]

Write dict to disk

class evcouplings.utils.helpers.Progressbar(total_size, bar_length=60)[source]

Bases: object

Progress bar for command line programs

Parameters:
  • total_size (int) – The total size of the iteration
  • bar_length (int) – The visual bar length that gets printed on stdout
update(chunk)[source]

Updates and prints the progress of the progressbar

Parameters:chunk (int) – The size of the elements that are processed in the current iteration
evcouplings.utils.helpers.find_segments(data)[source]

Find consecutive number segments, based on Python 2.7 itertools recipe

Parameters:data (iterable) – Iterable in which to look for consecutive number segments (has to be in order)
evcouplings.utils.helpers.range_overlap(a, b)[source]
Source: http://stackoverflow.com/questions/2953967/
built-in-function-for-computing-overlap-in-python

Function assumes that start < end for a and b

Note

Ends of range are not inclusive

Parameters:
  • a (tuple(int, int)) – Start and end of first range (end of range is not inclusive)
  • b (tuple(int, int)) – Start and end of second range (end of range is not inclusive)
Returns:

Length of overlap between ranges a and b

Return type:

int

evcouplings.utils.helpers.render_template(template_file, mapping)[source]

Render a template using jinja2 and substitute values from mapping

Parameters:
  • template_file (str) – Path to jinja2 template
  • mapping (dict) – Mapping used to substitute values in the template
Returns:

Rendered template

Return type:

str

evcouplings.utils.helpers.retry(func, retry_max_number=None, retry_wait=None, exceptions=None, retry_action=None, fail_action=None)[source]

Retry to execute a function as often as requested

Parameters:
  • func (callable) – Function to be executed until succcessful
  • retry_max_number (int, optional (default: None)) – Maximum number of retries. If None, will retry forever.
  • retry_wait (int, optional (default: None)) – Number of seconds to wait before attempting retry
  • exceptions (exception or tuple(exception)) – Single or tuple of exceptions to catch for retrying (any other exception will cause immediate fail)
  • retry_action (callable) – Function to execute upon a retry
  • fail_action – Function to execute upon final failure
evcouplings.utils.helpers.wrap(text, width=80)[source]

Wraps a string at a fixed width.

Parameters:
  • text (str) – Text to be wrapped
  • width (int) – Line width
Returns:

Wrapped string

Return type:

str

evcouplings.utils.pipeline module

evcouplings.utils.summarize module

evcouplings.utils.system module

System-level calls to external tools, directory creation, etc.

Authors:
Thomas A. Hopf
exception evcouplings.utils.system.ExternalToolError[source]

Bases: Exception

Exception for failing external calculations

exception evcouplings.utils.system.ResourceError[source]

Bases: Exception

Exception for missing resources (files, URLs, …)

evcouplings.utils.system.create_prefix_folders(prefix)[source]

Create a directory tree contained in a prefix.

prefix : str
Prefix containing directory tree
evcouplings.utils.system.get(url, output_path=None, allow_redirects=False)[source]

Download external resource

Parameters:
  • url (str) – URL of resource that should be downloaded
  • output_path (str, optional) – Save contents of URL to this file (only for text files)
  • allow_redirects (bool) – Allow redirects by server or not
Returns:

r – Response object, use r.text to access text, r.json() to decode json, and r.content for raw bytestring

Return type:

requests.models.Response

Raises:

ResourceError

evcouplings.utils.system.get_urllib(url, output_path)[source]

Download external resource to file using urllib. This function is intended for cases where get() implemented using requests can not be used, e.g. for download from an FTP server.

Parameters:
  • url (str) – URL of resource that should be downloaded
  • output_path (str, optional) – Save contents of URL to this file (only for text files)
evcouplings.utils.system.insert_dir(prefix, *dirs, rootname_subdir=True)[source]

Create new path by inserting additional directories into the folder tree of prefix (but keeping the filename prefix at the end),

Parameters:
  • prefix (str) – Prefix of path that should be extended
  • *dirs (str) – Add these directories at the end of path
  • rootname_subdir (bool, optional (default: True)) –

    Given /my/path/prefix,

    • if True, creates structure like /my/path/prefix/*dirs/prefix
    • if False, creates structure like /my/path/*dirs/prefix
Returns:

Extended path

Return type:

str

evcouplings.utils.system.makedirs(directories)[source]

Create directory subtree, some or all of the folders may already exist.

Parameters:directories (str) – Directory subtree to create
evcouplings.utils.system.run(cmd, stdin=None, check_returncode=True, working_dir=None, shell=False, env=None)[source]

Run external program as subprocess.

Parameters:
  • cmd (str or list of str) – Command (and optional command line arguments)
  • stdin (str or byte sequence, optional (default: None)) – Input to be sent to STDIN of the process
  • check_returncode (bool, optional (default=True)) – Verify if call had returncode == 0, otherwise raise ExternalToolError
  • working_dir (str, optional (default: None)) – Change to this directory before running command
  • shell (bool, optional (default: False)) – Invoke shell when calling subprocess (default: False)
  • env (dict, optional (default: None)) – Use this environment for executing the subprocess
Returns:

  • int – Return code of process
  • stdout – Byte string with stdout output
  • stderr – Byte string of stderr output

Raises:

ExternalToolError

evcouplings.utils.system.temp()[source]

Create a temporary file

Returns:Path of temporary file
Return type:str
evcouplings.utils.system.tempdir()[source]

Create a temporary directory

Returns:Path of temporary directory
Return type:str
evcouplings.utils.system.valid_file(file_path)[source]

Verify if a file exists and is not empty.

Parameters:file_path (str) – Path to file to check
Returns:True if file exists and is non-zero size, False otherwise.
Return type:bool
evcouplings.utils.system.verify_resources(message, *args)[source]

Verify if a set of files exists and is not empty.

Parameters:
  • message (str) – Message to display with raised ResourceError
  • *args (List of str) – Path(s) of file(s) to be checked
Raises:

ResourceError – If any of the resources does not exist or is empty

evcouplings.utils.system.write_file(file_path, content)[source]

Writes content to output file

Parameters:
  • file_path (str) – Path of output file
  • content (str) – Content to be written to file

evcouplings.utils.update_database module

command-line app to update the necessary databases

Authors:
Benjamin Schubert
evcouplings.utils.update_database.download_ftp_file(ftp_url, ftp_cwd, file_url, output_path, file_handling='wb', gziped=False, verbose=False)[source]

Downloads a gzip file from a remote ftp server and decompresses it on the fly into an output file

Parameters:
  • ftp_url (str) – the FTP server url
  • ftp_cwd (str) – the FTP directory of the file to download
  • file_url (str) – the file name that gets downloaded
  • output_path (str) – the path to the output file on the local system
  • file_handling (str) – the file handling mode (default: ‘wb’)
  • verbose (bool) – determines whether a progressbar is printed
evcouplings.utils.update_database.run(**kwargs)[source]

Exposes command line interface as a Python function.

Parameters:kwargs – See click.option decorators for app() function

Creates or overwrites an existing symlink

Parameters:
  • target (str) – the target file path
  • link_name (str) – the symlink name