MASA-Core
IAligner.hpp
Go to the documentation of this file.
00001 /*******************************************************************************
00002  *
00003  * Copyright (c) 2010-2015   Edans Sandes
00004  *
00005  * This file is part of MASA-Core.
00006  * 
00007  * MASA-Core is free software: you can redistribute it and/or modify
00008  * it under the terms of the GNU General Public License as published by
00009  * the Free Software Foundation, either version 3 of the License, or
00010  * (at your option) any later version.
00011  * 
00012  * MASA-Core is distributed in the hope that it will be useful,
00013  * but WITHOUT ANY WARRANTY; without even the implied warranty of
00014  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
00015  * GNU General Public License for more details.
00016  * 
00017  * You should have received a copy of the GNU General Public License
00018  * along with MASA-Core.  If not, see <http://www.gnu.org/licenses/>.
00019  *
00020  ******************************************************************************/
00021 
00022 #ifndef IALIGNER_HPP_
00023 #define IALIGNER_HPP_
00024 
00025 #include <string.h>
00026 #include <stdio.h>
00027 
00028 #include "libmasaTypes.hpp"
00029 #include "IManager.hpp"
00030 #include "parameters/AbstractAlignerParameters.hpp"
00031 #include "capabilities.hpp"
00032 #include "Grid.hpp"
00033 #include "Partition.hpp"
00034 
00035 
00036 /** @brief Interface between the MASA extension and the MASA framework.
00037  *
00038  * The IAligner is a pure abstract class that makes an interface point between
00039  * the portable code of MASA (MASA-Core) and the non-portable code
00040  * (MASA extension). Each MASA extension must contain its own Aligner, which
00041  * must implements the IAligner interface for a successful integration.
00042  *
00043  * Instead of implementing the IAligner directly, we recommend that the Aligner
00044  * extend the AbstractAligner class or one of its subclasses, which already
00045  * has some implemented methods that simplifies the construction of
00046  * a new IAligner implementation.
00047  *
00048  * The subclasses of the AbstractAligner have already implemented some
00049  * common code used in some kind of alignments. For instance, the
00050  * AbstractBlockAligner divides the Dynamic Program Matrix in blocks, the
00051  * OpenMPAligner class processes blocks using OpenMP and the
00052  * AbstractDiagonalAligner processes the blocks per anti-diagonal. See the
00053  * documentation of these classes to see the their benefits and utilization.
00054  *
00055  * \section sec_lc Aligner's life cycle
00056  *
00057  * The MASA-Core uses a single Aligner object during all the comparison, thus
00058  * it is important to understand the life cycle of this object.
00059  *
00060  * 1. <b>Instantiation</b>: The Aligner object is created in the C-main entry point
00061  *    and it is passed in the libmasa_entry_point() function call. Inside the
00062  *    constructor, the aligner should initialize all the data structure that
00063  *    will be used during all the lifecycle. In this moment, there is
00064  *    absolutely no information about the arguments supplied by the
00065  *    user.
00066  *
00067  * 2. <b>Initialization</b>: At this point, the MASA-Core has already read the command
00068  *    line arguments and it may have already forked many processes. So, the
00069  *    IAligner::initialize() method is called for each process with the
00070  *    unique identification of this process. This identification may be use,
00071  *    for instance, to initialize the hardware dedicated to this process. See
00072  *    the AbstractAlignerParameters::getForkId().
00073  *
00074  * 3. <b>Stage Execution</b>: On every stage, the MASA-Core change sequence
00075  *    orientation and align one or more partitions.
00076  *
00077  *    3.1. <b>%Sequence Configuration</b>: On every stage, the MASA-Core
00078  *    defines the orientation of the sequences and the range of nucleotides
00079  *    that may be processed during the stage. So, each new stage generate
00080  *    a call to the IAligner::setSequences() notifying the aligner with
00081  *    the sequence (possibly trimmed) and its maximum accessible length.
00082  *    Using this information, the aligner may allocate the sequence related
00083  *    structures using the correct size and data.
00084  *
00085  *    3.2. <b>%Partition %Alignment</b>: Each stage may process one or more partitions
00086  *    to be aligned, where each of them are associated to one call for the
00087  *    IAligner::alignPartition() method. Each partition is guaranteed to reside inside the sequence
00088  *    length supplied by IAligner::setSequences(). The
00089  *    calls to the IAligner::alignPartition() method are done serially,
00090  *    but inside this method the Aligner should used parallelism in order
00091  *    to speedup computation. See the IAligner::alignPartition() documentation
00092  *    in order to understand how to compute a partition.
00093  *
00094  *    3.3. <b>%Sequence Deallocation</b>: After each stage, the MASA-Core calls
00095  *    IAligner::unsetSequences() to notify the aligner for deallocation of
00096  *    the sequence related structures.
00097  *
00098  * 4. <b>Finalization</b>: This is done only once in the end of the process.
00099  *    This method should be used to deallocate any structure previously allocated
00100  *    during the Initialization step.
00101  *
00102  * \section Statistics
00103  *
00104  * The MASA-Core expects that some statistics are collected by the Aligner.
00105  * For instance, the number of processed cells is used to estimate the
00106  * GCUPS performance of the Aligner. Furthermore, some string are logged
00107  * in some files, so the Aligner can print internal information in these logs.
00108  *
00109  *
00110  * \section Capabilities
00111  *
00112  * Although the Aligner must implemented all the virtual methods (extending
00113  * one of the AbstractAligner subclasses or implementing directly the
00114  * IAligner class), the Aligner must also be compliant with
00115  * some requirements in order to produce a proper integration.
00116  * If the Aligner is fully compliant with a given requirement,
00117  * we say that this Aligner implements a capability. A MASA-Extension is
00118  * expected to implement a list of capability, which can be seen in the
00119  * aligner_capabilities_t structure documentation.
00120  *
00121  * The MASA-Core call the aligner many times during the execution and, in each
00122  * invocation, it may require (or not) a list of capabilities. For instance,
00123  * MASA-Core may require the Smith Waterman (SW) capability in stage 1 and
00124  * require the Needleman Wunsch (NW) capability in stage 2. Even if the Aligner
00125  * implements both SW and NW capabilities, the Aligner may execute each
00126  * capabilities only when requested, otherwise the integration may fail.
00127  *
00128  * Each capability is associated with a conditional requirement test that
00129  * must be verified before its execution. In order to test these conditional
00130  * requirements, each aligner must call some methods from the IManager
00131  * interface (see setManager() function). Besides the conditional requirement
00132  * tests, the Manager also provides all the parameters necessary to customize
00133  * the alignment, for example the sequences, the partitions coordinates.
00134  * The AbstractAligner hides the IManager invocation using some delegate
00135  * methods with protected visibility.
00136  *
00137  * The MASA-Core may fork many processes to work in parallel. Furthermore,
00138  * MASA-Core executes a load balancing considering the computation power of
00139  * each process. The maximum number of forked processes and its computation
00140  * power is architectural dependent and is informed by the MASA-Extension
00141  * using the IAlignerParameters::getForkWeights() method.
00142  *
00143  * @see The aligner_capabilities_t struct describes all the possible capabilities
00144  * and the requirement necessary to implement it.
00145  * @see The AbstractAligner has many methods that helps the implementation of
00146  * the IAligner interface.
00147  * @see The IManager interface manages the execution of the IAligner.
00148  */
00149 class IAligner {
00150 
00151 public:
00152 
00153                 /**
00154                  * Returns the capabilities of the aligner.
00155                  *
00156                  * @return the capabilities.
00157                  * @see aligner_capabilities_t
00158                  */
00159                 virtual aligner_capabilities_t getCapabilities() = 0;
00160 
00161                 /**
00162                  * Associates this IAligner with an instance of IManager. The IManager
00163                  * controls the execution of the aligner.
00164                  *
00165                  * @param manager the IManager that will control the execution of
00166                  * this IAligner.
00167                  */
00168                 virtual void setManager(IManager* manager) = 0;
00169 
00170                 /**
00171                  * Supply the computational power weight of each forked processed.
00172                  *
00173                  * The returned vector must contain the weight of each process and it
00174                  * must be terminated with a 0 element. The amount of the matrix
00175                  * processed by each process will be determined by the ratio between
00176                  * each weight and the sum of the weights.
00177                  *
00178                  * For example, the vector \f$\{10,20,20,0\}\f$ will
00179                  * allow the MASA framework to fork 3 processes. The first process
00180                  * will process \f$\frac{10}{50} = 20\%\f$ of the matrix and the
00181                  * other two processes will process \f$\frac{20}{50} = 40\%\f$ of
00182                  * the matrix each.
00183                  *
00184                  * @return an integer vector returning the weights of each process.
00185                  * The last element must be zero. If NULL is returned, no forked
00186                  * processes will be allowed.
00187                  */
00188                 virtual const int* getForkWeights() = 0;
00189 
00190                 /**
00191                  * Get the command line parameters of the IAligner class. The
00192                  * IAlignerParameters interface is used by MASA to present
00193                  * extra command line parameters to each IAligner subclass. Be
00194                  * warned that the MASA-Core is responsible to present
00195                  * all the command line options, so, any attempt to modify
00196                  * the command line parameters must be done by the
00197                  * IAlignerParameters class, otherwise the behavior of the
00198                  * entire MASA-Core may be compromised. The AbstractAlignerParameters
00199                  * implements the base operations of the IAlignerParameters
00200                  * interface.
00201                  *
00202                  *
00203                  * @return The customized parameters for this IAligner.
00204                  *
00205                  * @see The IAlignerParameters class presents the details
00206                  * to customize these parameters.
00207                  */
00208                 virtual IAlignerParameters* getParameters() = 0;
00209 
00210                 /**
00211                  * Returns the match/mismatch parameters and the gap penalties used
00212                  * by this IAligner.
00213                  * @return the score parameters of this IAligner.
00214                  */
00215                 virtual const score_params_t* getScoreParameters() = 0;
00216 
00217                 /**
00218                  * Initializes the Aligner before the execution of the alignment
00219                  * procedure. The IManager associated with this IAligner may only
00220                  * be called to obtain the command line parameters, specially the
00221                  * AbstractAlignerParameters::getForkId() in multi-process executions.
00222                  *
00223                  * The IManager is not set and must not be queried. The initialize()
00224                  * method is called only once per process. Here, we may initialize
00225                  * the hardware and allocate some global structures that are not
00226                  * associated with the sequence sizes.
00227                  *
00228                  *
00229                  * The initialize() method will be called once for each MASA stage and
00230                  * the sequences will not be changed until the finalize method be called.
00231                  * Meanwhile, the alignPartition() method may be called multiple times
00232                  * before the finalize() method is called.
00233                  *
00234                  * The initialize() method may be used to process and allocated the
00235                  * sequences in memory. Note that the MASA stages may change the
00236                  * direction of the sequences, so consider that each call to the
00237                  * initialize method will change the sequence data.
00238                  */
00239                 virtual void initialize() = 0;
00240 
00241                 /**
00242                  * This method is called in the beginning of each stage to inform
00243                  * the aligner about the sequence to be aligned. The MASA stages
00244                  * alternates the direction of the sequences in each
00245                  * stage, possibly trimming if the beginning and end of the sequences
00246                  * will not be used in this stage. So consider that each call to
00247                  * the onSequenceChange() method may completely change the sequence
00248                  * data for the further calls to alignPartition.
00249                  *
00250                  * Note that the seq0_len and seq1_len parameters are not
00251                  * the sizes of the original sequences, but the sizes of the trimmed
00252                  * sequences.
00253                  *
00254                  * @param seq0  trimmed vertical sequence data
00255                  * @param seq1  trimmed  horizontal sequence data
00256                  * @param seq0_len      length of the trimmed vertical sequence.
00257                  * @param seq1_len      length of the trimmed horizontal sequence.
00258                  */
00259                 virtual void setSequences(const char* seq0, const char* seq1, int seq0_len, int seq1_len) = 0;
00260 
00261                 /**
00262                  * Defines that the sequence will not be used anymore and the Aligner
00263                  * should deallocate the memory used for them. This method is called
00264                  * in the end of each stage.
00265                  */
00266                 virtual void unsetSequences() = 0;
00267 
00268                 /**
00269                  * Executes the alignment procedure.
00270                  *
00271                  * During the call of this method, all the methods of the IManager
00272                  * can be called to obtain the alignment parameters (partition
00273                  * boundaries, row/column data, conditional requirements, etc).
00274                  * Note that the sequence data is already available
00275                  * during the initialize() invocation,
00276                  * but the other information is only available during the invocation of
00277                  * the alignPartition() method.
00278                  *
00279                  * The alignPartition() method may be called multiple
00280                  * times between onSequenceChange() method calls.
00281                  *
00282                  * @param partition the partition to be aligned.
00283                  */
00284                 virtual void alignPartition(Partition partition) = 0;
00285 
00286 
00287                 /**
00288                  * Finalizes the execution of this IAligner. Use this method to free
00289                  * any allocated memory during the life time of the IAligner.
00290                  */
00291                 virtual void finalize() = 0;
00292 
00293                 /**
00294                  * This method executes the Myers-Miller matching procedure.
00295                  *
00296                  * @param buffer the vector with the last column data.
00297                  * @param base the vector with the special row in the reverse direction.
00298                  *                      This vector is the special row computed in the previous stage.
00299                  * @param len Defines that we must match the buffers in the range [0,len).
00300                  * @param goalScore the score that will be searched during the matching procedure.
00301                  * @return A match_result_t struct. If match_result_t::found is false, than
00302                  *                      the match procedure did not find the goal score. Otherwise,
00303                  *                      match_result_t::found is true and match_result_t::i and match_result_t::j
00304                  *                      contains the coordinate where the goal score was found. Additionally,
00305                  *                      match_result_t::type may be a MATCH_ALIGNED if the goal was found in
00306                  *                      the \f$H\f$ (match) boundary or MATCH_GAPPED if it was found in the \f$F\f$ (gap) boundary.
00307                  *                      If both MATCH_ALIGNED and MATCH_GAPPED applies, the MATCH_ALIGNED must be preferred.
00308                  */
00309                 virtual match_result_t matchLastColumn(const cell_t* buffer, const cell_t* base, int len, int goalScore) = 0;
00310 
00311                 /**
00312                  * Returns the grid of blocks. This method is only necessary if
00313                  * the capabilities_t::dispatch_block_scores is SUPPORTED.
00314                  *
00315                  * @return the number of blocks in the vertical direction of the grid.
00316                  */
00317                 virtual const Grid* getGrid() const = 0;
00318 
00319         /* Statistic functions */
00320 
00321                 /**
00322                  * clear all internal statistics of the aligner.
00323                  */
00324                 virtual void clearStatistics() = 0;
00325 
00326                 /**
00327                  * This method is called immediately after initialize(), allowing
00328                  * the aligner to print some initial information.
00329                  *
00330                  * @param file The log file where the statistics will be written.
00331                  */
00332                 virtual void printInitialStatistics(FILE* file) = 0;
00333 
00334                 /**
00335                  * This method is called immediately after onSequenceChange(), allowing
00336                  * the aligner to print some information before a new stage.
00337                  *
00338                  * @param file The log file where the statistics will be written.
00339                  */
00340                 virtual void printStageStatistics(FILE* file) = 0;
00341 
00342                 /**
00343                  * This method is called immediately after finalize(), allowing
00344                  * the aligner to print some finalization information.
00345                  *
00346                  * @param file The log file where the statistics will be written.
00347                  */
00348                 virtual void printFinalStatistics(FILE* file) = 0;
00349 
00350                 /**
00351                  * This method allows the aligner to print the internal statistics,
00352                  * considering that they ware cleaned in the last call of
00353                  * clearStatistics() method.
00354                  *
00355                  * @param file The log file where the statistics will be written.
00356                  */
00357                 virtual void printStatistics(FILE* file) = 0;
00358 
00359                 /**
00360                  * Returns a string that will be appended into some intermediate
00361                  * statistics information of stage 1. Basically, the aligner should
00362                  * present how many steps have been calculated, giving an idea of
00363                  * conclusion percentage, and some quick information about pruning
00364                  * status. All the string should reside in a line (around 80
00365                  * characters).
00366                  *
00367                  * @return a single line progress strings without '\\n'.
00368                  */
00369                 virtual const char* getProgressString() const = 0;
00370 
00371                 /**
00372                  * Returns the number of cells that have been processed since the last
00373                  * call to clearStatistics.
00374                  *
00375                  * @return the number of processed cells.
00376                  */
00377                 virtual long long getProcessedCells() = 0;
00378 
00379 
00380 
00381 protected:
00382 /* protected constructors avoid the direct creation/deletion of this interface */
00383                 ~IAligner() {};
00384                 IAligner() {};
00385 
00386 
00387 };
00388 
00389 #endif /* IALIGNER_HPP_ */