euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/plants/pine/pine_evigene201308

      Name                         Last modified       Size  

[DIR] Parent Directory 20-Dec-2016 12:57 - [TXT] pine9genesets_homolstats.txt 09-May-2018 15:19 4k [DIR] publicset/ 16-Dec-2016 21:31 -


Pine tree gene sets, EvidentialGene compared with other methods

This Evigene reconstruction of Loblolly pine tree genes, of 2013 august
is pine_evigene201308/publicset/ 

The EvidentialGene recipe for reconstructing accurate gene sets works well in comparison to 
these other ways: genes modeled on genome, genes assembled with Trinity from Illumina read pairs, 
genes assembled from PacBio longer, but less accurate reads, for this example of pine trees. 

Evigene methods in summary: start with >=100 mill. Illumina read pairs, use several gene assemblers, 
varying k-mer and other options, then reduce this over-assembly to a species-accurate gene set 
with Evigene-R pipeline. 

Conserved gene accuracy and completeness is measured with protein
homology to reference species genes, for gene sets of Pine tree species
(loblolly, sugar and 3 other pine trees), and summarized here for
EvidentialGene methods in comparison with transcript assembly and genome
predicted gene sets of treegenesdb.org, and from GenBank-TSA.

Pine tree gene sets compared
      REFERENCE       Arabidopsis           Grape (Vitus vin.)
Geneset      Rank Found% Align% AlignAA   Found% Align% AlignAA  Source/Method
Pta.Tra13Evg   1   97.8   80.9   374.1     98.5   82.7   428.2   Evigene of multi-kmer, 4 assemblers, Illumina, 2013 
Pta.Gmod1v     5   89.5   70.5   318.0     89.3   71.0   361.6   Genome gene models, 2016
Pta.Tra16Pb    8   85.4   67.1   289.3     84.5   66.7   314.2   PacBio asm, 2016
Pla.Tra16IlPb  3   88.7   75.3   351.7     90.2   76.7   396.1   mix of Trinity/Illumina and PacBio asm, 2016 
Pla.Gmod1v     5   87.5   70.1   317.4     86.9   70.0   356.4   Genome gene models, 2016
Ppa.Tra15Evg   1   95.5   80.4   375.6     96.9   82.4   426.8   Evigene of multi-kmer, 3 assemblers, Illumina, 2015  
Pca.Tra15Il    3   92.0   74.7   344.6     94.7   77.1   393.5   Trinity of Illumina, 2015
Pal.Tra15Il    7   81.3   67.6   344.5     84.7   70.8   397.5   Trinity of Illumina 

Source/Method Ranked by gene set completeness
Geneset     Rank DiffAln%    Source/Method
Pta.Tra13Evg   1     0   Evigene of multi-kmer, 4 assemblers, Illumina; LPG.2013
Ppa.Tra15Evg   1     0   Evigene of multi-kmer, 3 assemblers, Illumina; TSA.GECO 2015
Pca.Tra15Il    3    -6   Trinity 1-kmer asm of Illumina pairs; TSA.GBLJ 2015
Pla.Tra16IlPb  3    -6   mix of Trinity and PacBio asms; TSA.GEUZ, SPG.2016, 
Pta.Gmod1v     5   -11   Genome gene models, LPG.2015, v1.1
Pla.Gmod1v     5   -11   Genome gene models, SPG.2016, v1
Pal.Tra15Il    7   -13   Trinity 1-kmer asm of Illumina, TSA.GDQR 2015
Pta.Tra16Pb    8   -15   PacBio asm, LPG.2016, 

Statistics:
  Found = % reference proteins with significant alignment to test gene sets
  Align = % alignment of target proteins sets to reference proteins
  AlignAA = average alignment size (in aminos) to reference proteins
  DiffAln = Difference in % alignment from Rank 1

Species Geneset key:
 Pta = Pinus taeda (loblolly); PRJNA174450 for genome annotation;
 Pta.Tra13Evg = Evigene of multi-kmer (Oases,Soap) + Trinity assemblies of Pinus taeda, 2013 august
  at /eugenes.org/EvidentialGene/plants/pine/pine_evigene201308/publicset/
 Pta.Gmod1v = Maker gene models  of  Pinus taeda, 2015, v1.1
  at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pita/
 Pta.Tra16Pb = PacBio sequences + assemblies of Pinus taeda genes, 2016   

Pla = Pinus lambertiana (sugar); 
 Pla.Tra16IlPb = TSA.GEUZ PRJNA174450 2016;  project mix of Illumina/Trinity 1-kmer and PacBio asm
 Pla.Gmod1v = Maker gene models  of Pinus lamb. (sugar), 2016
  at /treegenesdb.org/ftp/Genome_Data/genome/pinerefseq/Pila/

Ppa = Pinus patula; TSA.GECO PRJNA301922 2015; Evigene of multi-kmer, 3 assemblers, Illumina pairs from GenBank-TSA; 
  doi:10.1186/s12864-015-2277-7 
Pca = Pinus canariensis; TSA.GBLJ PRJNA255888 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA 
Pal = Pinus albicaulis; TSA.GDQR PRJNA294917 2015; Trinity 1-kmer asm of Illumina pairs from GenBank-TSA 

REFERENCES
 Arabidopsis thaliana model plant, Araport 2015 version, nprotein=nnnnn, nloci=28902 
 Vitis vinifera grape, NCBI RefSeq 2014 version, nprotein=35618, nloci=nnnnn

Don Gilbert, gilbertd at_indiana_edu 
update: 18 Dec 2016 


Developed at the Genome Informatics Lab of Indiana University Biology Department