|
MASA-Core
|
00001 /******************************************************************************* 00002 * 00003 * Copyright (c) 2010-2015 Edans Sandes 00004 * 00005 * This file is part of MASA-Core. 00006 * 00007 * MASA-Core is free software: you can redistribute it and/or modify 00008 * it under the terms of the GNU General Public License as published by 00009 * the Free Software Foundation, either version 3 of the License, or 00010 * (at your option) any later version. 00011 * 00012 * MASA-Core is distributed in the hope that it will be useful, 00013 * but WITHOUT ANY WARRANTY; without even the implied warranty of 00014 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 00015 * GNU General Public License for more details. 00016 * 00017 * You should have received a copy of the GNU General Public License 00018 * along with MASA-Core. If not, see <http://www.gnu.org/licenses/>. 00019 * 00020 ******************************************************************************/ 00021 00022 #ifndef IALIGNER_HPP_ 00023 #define IALIGNER_HPP_ 00024 00025 #include <string.h> 00026 #include <stdio.h> 00027 00028 #include "libmasaTypes.hpp" 00029 #include "IManager.hpp" 00030 #include "parameters/AbstractAlignerParameters.hpp" 00031 #include "capabilities.hpp" 00032 #include "Grid.hpp" 00033 #include "Partition.hpp" 00034 00035 00036 /** @brief Interface between the MASA extension and the MASA framework. 00037 * 00038 * The IAligner is a pure abstract class that makes an interface point between 00039 * the portable code of MASA (MASA-Core) and the non-portable code 00040 * (MASA extension). Each MASA extension must contain its own Aligner, which 00041 * must implements the IAligner interface for a successful integration. 00042 * 00043 * Instead of implementing the IAligner directly, we recommend that the Aligner 00044 * extend the AbstractAligner class or one of its subclasses, which already 00045 * has some implemented methods that simplifies the construction of 00046 * a new IAligner implementation. 00047 * 00048 * The subclasses of the AbstractAligner have already implemented some 00049 * common code used in some kind of alignments. For instance, the 00050 * AbstractBlockAligner divides the Dynamic Program Matrix in blocks, the 00051 * OpenMPAligner class processes blocks using OpenMP and the 00052 * AbstractDiagonalAligner processes the blocks per anti-diagonal. See the 00053 * documentation of these classes to see the their benefits and utilization. 00054 * 00055 * \section sec_lc Aligner's life cycle 00056 * 00057 * The MASA-Core uses a single Aligner object during all the comparison, thus 00058 * it is important to understand the life cycle of this object. 00059 * 00060 * 1. <b>Instantiation</b>: The Aligner object is created in the C-main entry point 00061 * and it is passed in the libmasa_entry_point() function call. Inside the 00062 * constructor, the aligner should initialize all the data structure that 00063 * will be used during all the lifecycle. In this moment, there is 00064 * absolutely no information about the arguments supplied by the 00065 * user. 00066 * 00067 * 2. <b>Initialization</b>: At this point, the MASA-Core has already read the command 00068 * line arguments and it may have already forked many processes. So, the 00069 * IAligner::initialize() method is called for each process with the 00070 * unique identification of this process. This identification may be use, 00071 * for instance, to initialize the hardware dedicated to this process. See 00072 * the AbstractAlignerParameters::getForkId(). 00073 * 00074 * 3. <b>Stage Execution</b>: On every stage, the MASA-Core change sequence 00075 * orientation and align one or more partitions. 00076 * 00077 * 3.1. <b>%Sequence Configuration</b>: On every stage, the MASA-Core 00078 * defines the orientation of the sequences and the range of nucleotides 00079 * that may be processed during the stage. So, each new stage generate 00080 * a call to the IAligner::setSequences() notifying the aligner with 00081 * the sequence (possibly trimmed) and its maximum accessible length. 00082 * Using this information, the aligner may allocate the sequence related 00083 * structures using the correct size and data. 00084 * 00085 * 3.2. <b>%Partition %Alignment</b>: Each stage may process one or more partitions 00086 * to be aligned, where each of them are associated to one call for the 00087 * IAligner::alignPartition() method. Each partition is guaranteed to reside inside the sequence 00088 * length supplied by IAligner::setSequences(). The 00089 * calls to the IAligner::alignPartition() method are done serially, 00090 * but inside this method the Aligner should used parallelism in order 00091 * to speedup computation. See the IAligner::alignPartition() documentation 00092 * in order to understand how to compute a partition. 00093 * 00094 * 3.3. <b>%Sequence Deallocation</b>: After each stage, the MASA-Core calls 00095 * IAligner::unsetSequences() to notify the aligner for deallocation of 00096 * the sequence related structures. 00097 * 00098 * 4. <b>Finalization</b>: This is done only once in the end of the process. 00099 * This method should be used to deallocate any structure previously allocated 00100 * during the Initialization step. 00101 * 00102 * \section Statistics 00103 * 00104 * The MASA-Core expects that some statistics are collected by the Aligner. 00105 * For instance, the number of processed cells is used to estimate the 00106 * GCUPS performance of the Aligner. Furthermore, some string are logged 00107 * in some files, so the Aligner can print internal information in these logs. 00108 * 00109 * 00110 * \section Capabilities 00111 * 00112 * Although the Aligner must implemented all the virtual methods (extending 00113 * one of the AbstractAligner subclasses or implementing directly the 00114 * IAligner class), the Aligner must also be compliant with 00115 * some requirements in order to produce a proper integration. 00116 * If the Aligner is fully compliant with a given requirement, 00117 * we say that this Aligner implements a capability. A MASA-Extension is 00118 * expected to implement a list of capability, which can be seen in the 00119 * aligner_capabilities_t structure documentation. 00120 * 00121 * The MASA-Core call the aligner many times during the execution and, in each 00122 * invocation, it may require (or not) a list of capabilities. For instance, 00123 * MASA-Core may require the Smith Waterman (SW) capability in stage 1 and 00124 * require the Needleman Wunsch (NW) capability in stage 2. Even if the Aligner 00125 * implements both SW and NW capabilities, the Aligner may execute each 00126 * capabilities only when requested, otherwise the integration may fail. 00127 * 00128 * Each capability is associated with a conditional requirement test that 00129 * must be verified before its execution. In order to test these conditional 00130 * requirements, each aligner must call some methods from the IManager 00131 * interface (see setManager() function). Besides the conditional requirement 00132 * tests, the Manager also provides all the parameters necessary to customize 00133 * the alignment, for example the sequences, the partitions coordinates. 00134 * The AbstractAligner hides the IManager invocation using some delegate 00135 * methods with protected visibility. 00136 * 00137 * The MASA-Core may fork many processes to work in parallel. Furthermore, 00138 * MASA-Core executes a load balancing considering the computation power of 00139 * each process. The maximum number of forked processes and its computation 00140 * power is architectural dependent and is informed by the MASA-Extension 00141 * using the IAlignerParameters::getForkWeights() method. 00142 * 00143 * @see The aligner_capabilities_t struct describes all the possible capabilities 00144 * and the requirement necessary to implement it. 00145 * @see The AbstractAligner has many methods that helps the implementation of 00146 * the IAligner interface. 00147 * @see The IManager interface manages the execution of the IAligner. 00148 */ 00149 class IAligner { 00150 00151 public: 00152 00153 /** 00154 * Returns the capabilities of the aligner. 00155 * 00156 * @return the capabilities. 00157 * @see aligner_capabilities_t 00158 */ 00159 virtual aligner_capabilities_t getCapabilities() = 0; 00160 00161 /** 00162 * Associates this IAligner with an instance of IManager. The IManager 00163 * controls the execution of the aligner. 00164 * 00165 * @param manager the IManager that will control the execution of 00166 * this IAligner. 00167 */ 00168 virtual void setManager(IManager* manager) = 0; 00169 00170 /** 00171 * Supply the computational power weight of each forked processed. 00172 * 00173 * The returned vector must contain the weight of each process and it 00174 * must be terminated with a 0 element. The amount of the matrix 00175 * processed by each process will be determined by the ratio between 00176 * each weight and the sum of the weights. 00177 * 00178 * For example, the vector \f$\{10,20,20,0\}\f$ will 00179 * allow the MASA framework to fork 3 processes. The first process 00180 * will process \f$\frac{10}{50} = 20\%\f$ of the matrix and the 00181 * other two processes will process \f$\frac{20}{50} = 40\%\f$ of 00182 * the matrix each. 00183 * 00184 * @return an integer vector returning the weights of each process. 00185 * The last element must be zero. If NULL is returned, no forked 00186 * processes will be allowed. 00187 */ 00188 virtual const int* getForkWeights() = 0; 00189 00190 /** 00191 * Get the command line parameters of the IAligner class. The 00192 * IAlignerParameters interface is used by MASA to present 00193 * extra command line parameters to each IAligner subclass. Be 00194 * warned that the MASA-Core is responsible to present 00195 * all the command line options, so, any attempt to modify 00196 * the command line parameters must be done by the 00197 * IAlignerParameters class, otherwise the behavior of the 00198 * entire MASA-Core may be compromised. The AbstractAlignerParameters 00199 * implements the base operations of the IAlignerParameters 00200 * interface. 00201 * 00202 * 00203 * @return The customized parameters for this IAligner. 00204 * 00205 * @see The IAlignerParameters class presents the details 00206 * to customize these parameters. 00207 */ 00208 virtual IAlignerParameters* getParameters() = 0; 00209 00210 /** 00211 * Returns the match/mismatch parameters and the gap penalties used 00212 * by this IAligner. 00213 * @return the score parameters of this IAligner. 00214 */ 00215 virtual const score_params_t* getScoreParameters() = 0; 00216 00217 /** 00218 * Initializes the Aligner before the execution of the alignment 00219 * procedure. The IManager associated with this IAligner may only 00220 * be called to obtain the command line parameters, specially the 00221 * AbstractAlignerParameters::getForkId() in multi-process executions. 00222 * 00223 * The IManager is not set and must not be queried. The initialize() 00224 * method is called only once per process. Here, we may initialize 00225 * the hardware and allocate some global structures that are not 00226 * associated with the sequence sizes. 00227 * 00228 * 00229 * The initialize() method will be called once for each MASA stage and 00230 * the sequences will not be changed until the finalize method be called. 00231 * Meanwhile, the alignPartition() method may be called multiple times 00232 * before the finalize() method is called. 00233 * 00234 * The initialize() method may be used to process and allocated the 00235 * sequences in memory. Note that the MASA stages may change the 00236 * direction of the sequences, so consider that each call to the 00237 * initialize method will change the sequence data. 00238 */ 00239 virtual void initialize() = 0; 00240 00241 /** 00242 * This method is called in the beginning of each stage to inform 00243 * the aligner about the sequence to be aligned. The MASA stages 00244 * alternates the direction of the sequences in each 00245 * stage, possibly trimming if the beginning and end of the sequences 00246 * will not be used in this stage. So consider that each call to 00247 * the onSequenceChange() method may completely change the sequence 00248 * data for the further calls to alignPartition. 00249 * 00250 * Note that the seq0_len and seq1_len parameters are not 00251 * the sizes of the original sequences, but the sizes of the trimmed 00252 * sequences. 00253 * 00254 * @param seq0 trimmed vertical sequence data 00255 * @param seq1 trimmed horizontal sequence data 00256 * @param seq0_len length of the trimmed vertical sequence. 00257 * @param seq1_len length of the trimmed horizontal sequence. 00258 */ 00259 virtual void setSequences(const char* seq0, const char* seq1, int seq0_len, int seq1_len) = 0; 00260 00261 /** 00262 * Defines that the sequence will not be used anymore and the Aligner 00263 * should deallocate the memory used for them. This method is called 00264 * in the end of each stage. 00265 */ 00266 virtual void unsetSequences() = 0; 00267 00268 /** 00269 * Executes the alignment procedure. 00270 * 00271 * During the call of this method, all the methods of the IManager 00272 * can be called to obtain the alignment parameters (partition 00273 * boundaries, row/column data, conditional requirements, etc). 00274 * Note that the sequence data is already available 00275 * during the initialize() invocation, 00276 * but the other information is only available during the invocation of 00277 * the alignPartition() method. 00278 * 00279 * The alignPartition() method may be called multiple 00280 * times between onSequenceChange() method calls. 00281 * 00282 * @param partition the partition to be aligned. 00283 */ 00284 virtual void alignPartition(Partition partition) = 0; 00285 00286 00287 /** 00288 * Finalizes the execution of this IAligner. Use this method to free 00289 * any allocated memory during the life time of the IAligner. 00290 */ 00291 virtual void finalize() = 0; 00292 00293 /** 00294 * This method executes the Myers-Miller matching procedure. 00295 * 00296 * @param buffer the vector with the last column data. 00297 * @param base the vector with the special row in the reverse direction. 00298 * This vector is the special row computed in the previous stage. 00299 * @param len Defines that we must match the buffers in the range [0,len). 00300 * @param goalScore the score that will be searched during the matching procedure. 00301 * @return A match_result_t struct. If match_result_t::found is false, than 00302 * the match procedure did not find the goal score. Otherwise, 00303 * match_result_t::found is true and match_result_t::i and match_result_t::j 00304 * contains the coordinate where the goal score was found. Additionally, 00305 * match_result_t::type may be a MATCH_ALIGNED if the goal was found in 00306 * the \f$H\f$ (match) boundary or MATCH_GAPPED if it was found in the \f$F\f$ (gap) boundary. 00307 * If both MATCH_ALIGNED and MATCH_GAPPED applies, the MATCH_ALIGNED must be preferred. 00308 */ 00309 virtual match_result_t matchLastColumn(const cell_t* buffer, const cell_t* base, int len, int goalScore) = 0; 00310 00311 /** 00312 * Returns the grid of blocks. This method is only necessary if 00313 * the capabilities_t::dispatch_block_scores is SUPPORTED. 00314 * 00315 * @return the number of blocks in the vertical direction of the grid. 00316 */ 00317 virtual const Grid* getGrid() const = 0; 00318 00319 /* Statistic functions */ 00320 00321 /** 00322 * clear all internal statistics of the aligner. 00323 */ 00324 virtual void clearStatistics() = 0; 00325 00326 /** 00327 * This method is called immediately after initialize(), allowing 00328 * the aligner to print some initial information. 00329 * 00330 * @param file The log file where the statistics will be written. 00331 */ 00332 virtual void printInitialStatistics(FILE* file) = 0; 00333 00334 /** 00335 * This method is called immediately after onSequenceChange(), allowing 00336 * the aligner to print some information before a new stage. 00337 * 00338 * @param file The log file where the statistics will be written. 00339 */ 00340 virtual void printStageStatistics(FILE* file) = 0; 00341 00342 /** 00343 * This method is called immediately after finalize(), allowing 00344 * the aligner to print some finalization information. 00345 * 00346 * @param file The log file where the statistics will be written. 00347 */ 00348 virtual void printFinalStatistics(FILE* file) = 0; 00349 00350 /** 00351 * This method allows the aligner to print the internal statistics, 00352 * considering that they ware cleaned in the last call of 00353 * clearStatistics() method. 00354 * 00355 * @param file The log file where the statistics will be written. 00356 */ 00357 virtual void printStatistics(FILE* file) = 0; 00358 00359 /** 00360 * Returns a string that will be appended into some intermediate 00361 * statistics information of stage 1. Basically, the aligner should 00362 * present how many steps have been calculated, giving an idea of 00363 * conclusion percentage, and some quick information about pruning 00364 * status. All the string should reside in a line (around 80 00365 * characters). 00366 * 00367 * @return a single line progress strings without '\\n'. 00368 */ 00369 virtual const char* getProgressString() const = 0; 00370 00371 /** 00372 * Returns the number of cells that have been processed since the last 00373 * call to clearStatistics. 00374 * 00375 * @return the number of processed cells. 00376 */ 00377 virtual long long getProcessedCells() = 0; 00378 00379 00380 00381 protected: 00382 /* protected constructors avoid the direct creation/deletion of this interface */ 00383 ~IAligner() {}; 00384 IAligner() {}; 00385 00386 00387 }; 00388 00389 #endif /* IALIGNER_HPP_ */
1.7.6.1