MAiA
-
MAIA is an algorithm to integrate multiple genome assemblies. For example, assemblies originating from:
- Different runs of a de novo assembler
- Assemblies of different data types
- Comparative assemblies
What you need is:
- A set of assemblies
- A relatively closely related reference genome
PubMed link to the corresponding paper.
-
An overview of the process of integrating several assemblies with MAIA:
- Multiple de novo and comparative assemblies are created using specialized assemblers.
- The resulting contigs are pairwise aligned to each other to find end-to-end overlaps.
- An overlap graph is constructed, in which nodes represent contigs and edges represent overlaps. A forward and a reverse edge is added between the pairs of nodes, but these are indicated by an undirected edge for simplicity. A start node and an end node is determined using a reference genome. Edges are assigned weights based on several properties of the alignments and contigs, combined using weighted Z-scores.
- An orientation is assigned to the contigs by traversing the graph depth-first in order of weight (indicated by the numbers). When an edge assigns reverse orientation to a node, while a forward orientation has already been assigned via another edge, it is recognized as conflicting and it is removed.
- Oriented contigs and end-to-end overlaps form a directed graph.
- The highest scoring path is found using a Tabu search procedure, which leads to the assembly of a chromosome.
-
MAIA produces one .xgmml file per chromosome, which you can visualize in Cytoscape (File > Import > Network). Inspecting the graph might help interpreting the output.
-
- A Unix operating system
- Matlab 2009b or later (with Bioinformatics toolbox)
- The MAIA Matlab code: MAIA v0.5
- The MUMmer package (nucmer and delta-filter)
- The GAIMC Graph toolbox for Matlab
-
- Extract the MAIA code and add the folder to your MATLAB Path ( File > Set Path > Add wit subfolders)
- Install MUMmer and make sure nucmer and delta-filter are findable in the unix path
- Install the GAIMC Graph toolbox
-
- Start MATLAB.
- Try the CENPK chromosome 9 example, that's in the 'example' folder
- cd into the ./maia/example folder
- Run the example by typing:
>> maia('assembly_list.txt','./data/ref_genome/chr9_s288c.fa') - Now the example folder contains the file maia_assembly.fa and one cytoscape .xgmml per chromosome (only one in this case)
- Now run maia with yout own data
>> maia('tab delimited assembly list', 'reference genome')The tab delimited assembly list should be in the format:
AssemblyName1 TAB FastaFileName1 TAB Zscore1
AssemblyName2 TAB FastaFileName2 TAB Zscore2
... etc... - Checkout optional paramters with
>> help maia
Last update: December 15th, 2010. Contact: Jurgen Nijkamp