euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/cacao

      Name                      Last modified       Size  

[DIR] Parent Directory 21-May-2017 19:09 - [TXT] cacao_geneset_quality.txt 09-Feb-2014 18:21 5k [DIR] genes/ 15-Jun-2014 14:31 -


Theobroma cacao (chocolate bean tree) genes and genome, 
2012 March data release.
See also http://www.phytozome.net/cacao.php, NCBI Bioproject PRJNA51633

Publication Title:
The genome sequence of the most widely cultivated cacao type and its use to identify
candidate genes regulating pod color.
http://genomebiology.com/2013/14/6/r53   doi:10.1186/gb-2013-14-6-r53

Authors:
Juan C Motamayor1*^, Keithanne Mockaitis2^, Jeremy Schmutz1,3^, Niina
Haiminen4^, Donald Livingstone III1,5, Omar Cornejo6, Seth Findley1,
Ping Zheng7, Filippo Utro4, Stefan Royaert5, Christopher Saski8, Jerry
Jenkins1,3, Ram Podicheti9, Meixia Zhao10, Brian Scheffler11, Joseph C
Stack1, Alex Feltus8, Guiliana Mustiga1, Freddy Amores12, Wilbert
Phillips13, Jean Philippe Marelli14, Gregory D May15; Howard Shapiro1,
Jianxin Ma10, Carlos D. Bustamante6, Raymond J. Schnell1,5, Dorrie
Main7, Don Gilbert2, Laxmi Parida4 and David N. Kuhn5

Accession numbers
Whole Genome Shotgun project is at DDBJ/EMBL/GenBank under accession number [ALXC00000000]. 
Annotated genome assembly is at http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA51633
...........

Update 2014 Feb
Cacao genes from the Mars/USDA sponsored project are at top in plant gene-set completeness.
These were built using mixed methods that include more mRNA-assembly than genome-gene models.

  Gene set completeness for plant orthologs
  ranked by completeness (Bitscores, aaSize, nGroup, Tiny)
            Common families   All families  
Geneset     cBits    dSize   aBits   nGroup Tiny   
------------------------------------------------------
cacao1ma     671     15       544    15161  111 (0.7%) 
cotton       653      3       519    15026  153 (1%) 
orange1cn    648      0       499    14249  198 (1.3%) 
poplar       639     -2       512    15130  244 (1.6%)   
castorbean   631     -7       493    14605  460 (3.1%)  
capsella     603      0       435    13397  171 (1.2%)
eucalypt     624     -5       468    13877  312 (2.2%)
soybean      618    -17       477    14559  402 (2.7%) 
arabido.th   600     -1       428    13345  135 (1.0%) 
arabibo.ly   604     -1       430    13304  253 (1.9%) 
brassica     594      2       432    13714  283 (2%) 
grape        611    -20       447    13203  726 (5.4%) 
amborella    548     -6       355    11766  489 (4.1%) 
banana1g     542    -19       369    12537  577 (4.6%) 
------------------------------------------------------
    Common families n=7540, All families n=15928
    Bits  = bitscore from blastp, for groups common (cBits) to all and for 
            all (aBits) families with 3+ plants
    dSize = protein size difference from family median
    Tiny  = count of tiny protein size outliers (-3sd below family median)

Notes: cacao1ma, orange1cn, banana1g are best of 2 independent gene sets for 
those species.  cotton is close relative to cacao and its gene set has been 
built using the cacao1ma gene set (among others).  Bitscores are influenced
by phylogeny as well as quality, scores by alignment (somewhat less phylo-dependent)
show same ordering.  Protein size is closely +correlated with bitscore.
Ranking quality by protein size and orthology families (nGroup) gives similar
result, but arabido.th and brassica move up to middle (6,7th).

  Gene set completeness for plant orthologs
  comparing 2 independent gene sets for 3 species
              Common families      All families  
Geneset     cBits    dSize      aBits  nGroup  Tiny   
--------------------------------------------------------
cacao1ma     653     15         547    15161  112 (0.7%) 
cacao1cr     641     11         530    14897  235 (1.5%) 
orange1cn    629     0          502    14249  199 (1.3%) 
orange1jg    610     -21        480    14039  658 (4.6%) 
banana1g     522     -19        371    12537  577 (4.6%) 
banana1e     521     -21        349    11733  880 (7.5%) 
--------------------------------------------------------
    Common families n=8461, All families n=15838

Plant comparison gene sets
  amborella = amborella genome-gene predictions
              BioProject PRJNA212863, http://www.amborella.org/, doi:10.1126/science.1241089 
  banana1g = Banana genome-gene predictions
             BioProject PRJNA81189, http://www.musagenomics.org/, doi:10.1038/nature11241
  banana1e = Banana mRNA-seq only assembly with Evigene
             http://arthropods.eugenes.org/EvidentialGene/plants/banana/
  cacao1cr = Cacao Cirad genome-gene predictions
             http://cocoagendb.cirad.fr/ doi:10.1038/ng.736
  cacao1ma = Cacao Mars mRNA-assembly + genome-genes with Evigene
              BioProject PRJNA51633,  http://arthropods.eugenes.org/EvidentialGene/plants/cacao/ doi:10.1186/gb-2013-14-6-r53
  orange1cn = Sweet orange, Cn genome-genes gene set 
              BioProject PRJNA86123, http://citrus.hzau.edu.cn/orange, doi:10.1038/ng.2472
  orange1jg = Sweet orange, JGI genome-genes gene set
              http://www.phytozome.net/citrus.php
              
  arath = arabido.th, arabidopsis TAIR10,
  poptr = poplar, Populus poptr_Ptrichocarpa_156 JGI phytozome
  ricco = castorbean, Ricinus v0.1 from castorbean.jcvi.org
  soybn = soybean, soybn_Gmax_109 JGI phytozome
  vitvi = grape, vitvi_Vvinifera_145 JGI phytozome
  soltu = potato, Solanum v3.4 from potatogenomics.plantbiology.msu.edu/
  sorbi = sorghum, sorbi_Sbicolor_79 JGI phytozome

  cotton = gossypium phytozome/v9.0/Graimondii/
  capsella = phytozome/v9.0/Crubella/
  eucalyptus = phytozome/v9.0/Egrandis/
  brassica = phytozome/v9.0/Brapa/
  arabido.ly = phytozome/v9.0/Alyrata/
................................................................................  


Developed at the Genome Informatics Lab of Indiana University Biology Department