MASA-Core
Public Member Functions | Protected Member Functions
IAligner Class Reference

Detailed Description

Interface between the MASA extension and the MASA framework.

The IAligner is a pure abstract class that makes an interface point between the portable code of MASA (MASA-Core) and the non-portable code (MASA extension). Each MASA extension must contain its own Aligner, which must implements the IAligner interface for a successful integration.

Instead of implementing the IAligner directly, we recommend that the Aligner extend the AbstractAligner class or one of its subclasses, which already has some implemented methods that simplifies the construction of a new IAligner implementation.

The subclasses of the AbstractAligner have already implemented some common code used in some kind of alignments. For instance, the AbstractBlockAligner divides the Dynamic Program Matrix in blocks, the OpenMPAligner class processes blocks using OpenMP and the AbstractDiagonalAligner processes the blocks per anti-diagonal. See the documentation of these classes to see the their benefits and utilization.

Aligner's life cycle

The MASA-Core uses a single Aligner object during all the comparison, thus it is important to understand the life cycle of this object.

1. Instantiation: The Aligner object is created in the C-main entry point and it is passed in the libmasa_entry_point() function call. Inside the constructor, the aligner should initialize all the data structure that will be used during all the lifecycle. In this moment, there is absolutely no information about the arguments supplied by the user.

2. Initialization: At this point, the MASA-Core has already read the command line arguments and it may have already forked many processes. So, the IAligner::initialize() method is called for each process with the unique identification of this process. This identification may be use, for instance, to initialize the hardware dedicated to this process. See the AbstractAlignerParameters::getForkId().

3. Stage Execution: On every stage, the MASA-Core change sequence orientation and align one or more partitions.

3.1. Sequence Configuration: On every stage, the MASA-Core defines the orientation of the sequences and the range of nucleotides that may be processed during the stage. So, each new stage generate a call to the IAligner::setSequences() notifying the aligner with the sequence (possibly trimmed) and its maximum accessible length. Using this information, the aligner may allocate the sequence related structures using the correct size and data.

3.2. Partition Alignment: Each stage may process one or more partitions to be aligned, where each of them are associated to one call for the IAligner::alignPartition() method. Each partition is guaranteed to reside inside the sequence length supplied by IAligner::setSequences(). The calls to the IAligner::alignPartition() method are done serially, but inside this method the Aligner should used parallelism in order to speedup computation. See the IAligner::alignPartition() documentation in order to understand how to compute a partition.

3.3. Sequence Deallocation: After each stage, the MASA-Core calls IAligner::unsetSequences() to notify the aligner for deallocation of the sequence related structures.

4. Finalization: This is done only once in the end of the process. This method should be used to deallocate any structure previously allocated during the Initialization step.

Statistics

The MASA-Core expects that some statistics are collected by the Aligner. For instance, the number of processed cells is used to estimate the GCUPS performance of the Aligner. Furthermore, some string are logged in some files, so the Aligner can print internal information in these logs.

Capabilities

Although the Aligner must implemented all the virtual methods (extending one of the AbstractAligner subclasses or implementing directly the IAligner class), the Aligner must also be compliant with some requirements in order to produce a proper integration. If the Aligner is fully compliant with a given requirement, we say that this Aligner implements a capability. A MASA-Extension is expected to implement a list of capability, which can be seen in the aligner_capabilities_t structure documentation.

The MASA-Core call the aligner many times during the execution and, in each invocation, it may require (or not) a list of capabilities. For instance, MASA-Core may require the Smith Waterman (SW) capability in stage 1 and require the Needleman Wunsch (NW) capability in stage 2. Even if the Aligner implements both SW and NW capabilities, the Aligner may execute each capabilities only when requested, otherwise the integration may fail.

Each capability is associated with a conditional requirement test that must be verified before its execution. In order to test these conditional requirements, each aligner must call some methods from the IManager interface (see setManager() function). Besides the conditional requirement tests, the Manager also provides all the parameters necessary to customize the alignment, for example the sequences, the partitions coordinates. The AbstractAligner hides the IManager invocation using some delegate methods with protected visibility.

The MASA-Core may fork many processes to work in parallel. Furthermore, MASA-Core executes a load balancing considering the computation power of each process. The maximum number of forked processes and its computation power is architectural dependent and is informed by the MASA-Extension using the IAlignerParameters::getForkWeights() method.

See also:
The aligner_capabilities_t struct describes all the possible capabilities and the requirement necessary to implement it.
The AbstractAligner has many methods that helps the implementation of the IAligner interface.
The IManager interface manages the execution of the IAligner.

Definition at line 149 of file IAligner.hpp.

#include <IAligner.hpp>

Inheritance diagram for IAligner:
AbstractAligner AbstractAlignerSafe AbstractBlockAligner AbstractDiagonalAligner

List of all members.

Public Member Functions

virtual aligner_capabilities_t getCapabilities ()=0
 Returns the capabilities of the aligner.
virtual void setManager (IManager *manager)=0
 Associates this IAligner with an instance of IManager.
virtual const int * getForkWeights ()=0
 Supply the computational power weight of each forked processed.
virtual IAlignerParametersgetParameters ()=0
 Get the command line parameters of the IAligner class.
virtual const score_params_tgetScoreParameters ()=0
 Returns the match/mismatch parameters and the gap penalties used by this IAligner.
virtual void initialize ()=0
 Initializes the Aligner before the execution of the alignment procedure.
virtual void setSequences (const char *seq0, const char *seq1, int seq0_len, int seq1_len)=0
 This method is called in the beginning of each stage to inform the aligner about the sequence to be aligned.
virtual void unsetSequences ()=0
 Defines that the sequence will not be used anymore and the Aligner should deallocate the memory used for them.
virtual void alignPartition (Partition partition)=0
 Executes the alignment procedure.
virtual void finalize ()=0
 Finalizes the execution of this IAligner.
virtual match_result_t matchLastColumn (const cell_t *buffer, const cell_t *base, int len, int goalScore)=0
 This method executes the Myers-Miller matching procedure.
virtual const GridgetGrid () const =0
 Returns the grid of blocks.
virtual void clearStatistics ()=0
 clear all internal statistics of the aligner.
virtual void printInitialStatistics (FILE *file)=0
 This method is called immediately after initialize(), allowing the aligner to print some initial information.
virtual void printStageStatistics (FILE *file)=0
 This method is called immediately after onSequenceChange(), allowing the aligner to print some information before a new stage.
virtual void printFinalStatistics (FILE *file)=0
 This method is called immediately after finalize(), allowing the aligner to print some finalization information.
virtual void printStatistics (FILE *file)=0
 This method allows the aligner to print the internal statistics, considering that they ware cleaned in the last call of clearStatistics() method.
virtual const char * getProgressString () const =0
 Returns a string that will be appended into some intermediate statistics information of stage 1.
virtual long long getProcessedCells ()=0
 Returns the number of cells that have been processed since the last call to clearStatistics.

Protected Member Functions

 ~IAligner ()
 IAligner ()

Constructor & Destructor Documentation

IAligner::~IAligner ( ) [inline, protected]

Definition at line 383 of file IAligner.hpp.

IAligner::IAligner ( ) [inline, protected]

Definition at line 384 of file IAligner.hpp.


Member Function Documentation

virtual void IAligner::alignPartition ( Partition  partition) [pure virtual]

Executes the alignment procedure.

During the call of this method, all the methods of the IManager can be called to obtain the alignment parameters (partition boundaries, row/column data, conditional requirements, etc). Note that the sequence data is already available during the initialize() invocation, but the other information is only available during the invocation of the alignPartition() method.

The alignPartition() method may be called multiple times between onSequenceChange() method calls.

Parameters:
partitionthe partition to be aligned.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::clearStatistics ( ) [pure virtual]

clear all internal statistics of the aligner.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::finalize ( ) [pure virtual]

Finalizes the execution of this IAligner.

Use this method to free any allocated memory during the life time of the IAligner.

Implemented in AbstractBlockAligner.

Returns the capabilities of the aligner.

Returns:
the capabilities.
See also:
aligner_capabilities_t

Implemented in AbstractBlockAligner.

virtual const int* IAligner::getForkWeights ( ) [pure virtual]

Supply the computational power weight of each forked processed.

The returned vector must contain the weight of each process and it must be terminated with a 0 element. The amount of the matrix processed by each process will be determined by the ratio between each weight and the sum of the weights.

For example, the vector $\{10,20,20,0\}$ will allow the MASA framework to fork 3 processes. The first process will process $\frac{10}{50} = 20\%$ of the matrix and the other two processes will process $\frac{20}{50} = 40\%$ of the matrix each.

Returns:
an integer vector returning the weights of each process. The last element must be zero. If NULL is returned, no forked processes will be allowed.

Implemented in AbstractAligner.

virtual const Grid* IAligner::getGrid ( ) const [pure virtual]

Returns the grid of blocks.

This method is only necessary if the capabilities_t::dispatch_block_scores is SUPPORTED.

Returns:
the number of blocks in the vertical direction of the grid.

Implemented in AbstractAligner.

virtual IAlignerParameters* IAligner::getParameters ( ) [pure virtual]

Get the command line parameters of the IAligner class.

The IAlignerParameters interface is used by MASA to present extra command line parameters to each IAligner subclass. Be warned that the MASA-Core is responsible to present all the command line options, so, any attempt to modify the command line parameters must be done by the IAlignerParameters class, otherwise the behavior of the entire MASA-Core may be compromised. The AbstractAlignerParameters implements the base operations of the IAlignerParameters interface.

Returns:
The customized parameters for this IAligner.
See also:
The IAlignerParameters class presents the details to customize these parameters.

Implemented in AbstractBlockAligner.

virtual long long IAligner::getProcessedCells ( ) [pure virtual]

Returns the number of cells that have been processed since the last call to clearStatistics.

Returns:
the number of processed cells.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual const char* IAligner::getProgressString ( ) const [pure virtual]

Returns a string that will be appended into some intermediate statistics information of stage 1.

Basically, the aligner should present how many steps have been calculated, giving an idea of conclusion percentage, and some quick information about pruning status. All the string should reside in a line (around 80 characters).

Returns:
a single line progress strings without '\n'.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual const score_params_t* IAligner::getScoreParameters ( ) [pure virtual]

Returns the match/mismatch parameters and the gap penalties used by this IAligner.

Returns:
the score parameters of this IAligner.

Implemented in AbstractBlockAligner.

virtual void IAligner::initialize ( ) [pure virtual]

Initializes the Aligner before the execution of the alignment procedure.

The IManager associated with this IAligner may only be called to obtain the command line parameters, specially the AbstractAlignerParameters::getForkId() in multi-process executions.

The IManager is not set and must not be queried. The initialize() method is called only once per process. Here, we may initialize the hardware and allocate some global structures that are not associated with the sequence sizes.

The initialize() method will be called once for each MASA stage and the sequences will not be changed until the finalize method be called. Meanwhile, the alignPartition() method may be called multiple times before the finalize() method is called.

The initialize() method may be used to process and allocated the sequences in memory. Note that the MASA stages may change the direction of the sequences, so consider that each call to the initialize method will change the sequence data.

Implemented in AbstractBlockAligner.

virtual match_result_t IAligner::matchLastColumn ( const cell_t buffer,
const cell_t base,
int  len,
int  goalScore 
) [pure virtual]

This method executes the Myers-Miller matching procedure.

Parameters:
bufferthe vector with the last column data.
basethe vector with the special row in the reverse direction. This vector is the special row computed in the previous stage.
lenDefines that we must match the buffers in the range [0,len).
goalScorethe score that will be searched during the matching procedure.
Returns:
A match_result_t struct. If match_result_t::found is false, than the match procedure did not find the goal score. Otherwise, match_result_t::found is true and match_result_t::i and match_result_t::j contains the coordinate where the goal score was found. Additionally, match_result_t::type may be a MATCH_ALIGNED if the goal was found in the $H$ (match) boundary or MATCH_GAPPED if it was found in the $F$ (gap) boundary. If both MATCH_ALIGNED and MATCH_GAPPED applies, the MATCH_ALIGNED must be preferred.

Implemented in AbstractAligner.

virtual void IAligner::printFinalStatistics ( FILE *  file) [pure virtual]

This method is called immediately after finalize(), allowing the aligner to print some finalization information.

Parameters:
fileThe log file where the statistics will be written.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::printInitialStatistics ( FILE *  file) [pure virtual]

This method is called immediately after initialize(), allowing the aligner to print some initial information.

Parameters:
fileThe log file where the statistics will be written.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::printStageStatistics ( FILE *  file) [pure virtual]

This method is called immediately after onSequenceChange(), allowing the aligner to print some information before a new stage.

Parameters:
fileThe log file where the statistics will be written.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::printStatistics ( FILE *  file) [pure virtual]

This method allows the aligner to print the internal statistics, considering that they ware cleaned in the last call of clearStatistics() method.

Parameters:
fileThe log file where the statistics will be written.

Implemented in AbstractBlockAligner, and AbstractDiagonalAligner.

virtual void IAligner::setManager ( IManager manager) [pure virtual]

Associates this IAligner with an instance of IManager.

The IManager controls the execution of the aligner.

Parameters:
managerthe IManager that will control the execution of this IAligner.

Implemented in AbstractAligner.

virtual void IAligner::setSequences ( const char *  seq0,
const char *  seq1,
int  seq0_len,
int  seq1_len 
) [pure virtual]

This method is called in the beginning of each stage to inform the aligner about the sequence to be aligned.

The MASA stages alternates the direction of the sequences in each stage, possibly trimming if the beginning and end of the sequences will not be used in this stage. So consider that each call to the onSequenceChange() method may completely change the sequence data for the further calls to alignPartition.

Note that the seq0_len and seq1_len parameters are not the sizes of the original sequences, but the sizes of the trimmed sequences.

Parameters:
seq0trimmed vertical sequence data
seq1trimmed horizontal sequence data
seq0_lenlength of the trimmed vertical sequence.
seq1_lenlength of the trimmed horizontal sequence.

Implemented in AbstractBlockAligner.

virtual void IAligner::unsetSequences ( ) [pure virtual]

Defines that the sequence will not be used anymore and the Aligner should deallocate the memory used for them.

This method is called in the end of each stage.

Implemented in AbstractBlockAligner.


The documentation for this class was generated from the following file: