Name Last modified Size
Parent Directory 07-Jun-2017 20:51 -
publicset/ 05-Jun-2017 16:09 -
whitefly_chrasm2016/ 05-Jun-2017 16:54 -
whitefly_evigene16news.txt 07-Jun-2017 15:53 4k
whitefly_evigene16news0.txt 05-Jan-2017 21:35 3k
whitefly_evigene16sum.txt 07-Jun-2017 15:54 5k
whitefly_other_gene_models/ 07-Jun-2017 15:59 -
EvidentialGene for Bemisia tabaci whitefly, 2016/17
The EvidentialGene assembly of Bemisia tabaci whitefly genes, is updated from
one I did in 2012. This update is more accurate and complete by objective
orthology measures than recently available genome gene models for Bemisia
tabaci. This Evigene assembly is "reference free", assembled directly from
RNA-seq without using chromosomes for modeling genes.
Bemisia tabaci whitefly (cotton/crop plant pest)
Gene sets compared for reference proteins & expression
Reference: Pea aphid Fruit fly RNA-Introns
Geneset Found% AlnT% Found% AlnT% Found%
BtEvigene 81.2 88.0 74.1 74.9 68.5
BtNCBI 79.7 82.3 73.4 71.6 69.4
BtMaker 77.4 73.8 72.1 66.0 57.7
BtTrinity 73.5 59.2 68.0 53.2 50.5
Evigene methods are doing very well in comparison to current popular
gene reconstruction methods: Trinity-only RNA assembly, PacBio RNA
assembly, MAKER genome gene modeling, NCBI EGAP/RefSeq modeling,
Ensembl gene modeling, for animals and plants including arabidopsis,
maize, pine trees, mosquitos, honey bee, beetles, water fleas, ticks,
fishes, and mice.
Reconstruction from RNA only provides independent gene evidence, free of
errors and biases from chromosome assemblies and other species gene sets. Not
only are the easy, well known ortholog genes reconstructed well, but harder
gene problems of alternate transcripts, paralogs, and complex structured genes
are usually more complete from Evigene methods.
See this recent work at
For the genome-sleuths among you, here is a puzzle: There are scores of
whitefly RNA-expressed genes with near perfect nucleotide identity to some
plant genomes, including cotton plant, yet about 30 have good protein
alignment to pea aphid and other insect genes. Most are fully located on both
whitefly genome and plant genome assemblies. For example this one is found in
both whitefly and cotton genomes with high identity, and has pea aphid
homolog: Bemtab3dEVm002101t1 931 aa, transcript aligns 99% to whitefly
chromosomes, 66% aligns at 99% identity to Gossypium hirsutum chromosomes, and
protein aligns 92% to pea aphid ncbi:XP_008188404.1, a zinc finger prot.
Who should consider EvidentialGene for gene reconstruction?
* genomicists who want accurate, complete and objectively reconstructed genes,
including those of you who may not believe my claims, but will look at
objective results on this.
* model and well-supported genome projects, where curators can use these
to improve precision of high value gene information.
* new species genomes, use as a primary gene set, with alternate transcripts,
and/or assess gene predictions, chromosome assemblies for accuracy.
* gene/genome improvement projects, to add alternate transcripts,
un-discovered and fragmented gene models.
* transcriptome and expression projects for more accurate genes.
One of my goals with this work is to reconstruct many high-value (model,
otherwise) animal and plant gene sets in coming years. I welcome
collaborations, especially from groups with genomics + informatics
expertise. This methodology is highly automatable (think BIG DATA), but still
wants improvements. Species genes built with Evigene by independent
authors include a range of plants and animals, and several of these papers
provide independent reviews of Evigene versus other methods.
-- Don Gilbert, 2017 june
gilbertd at indiana.edu