euGenes/Arthropods About Arthropods EvidentialGene DroSpeGe

Index of /EvidentialGene/daphnia/daphnia_similoides/evg1dapsim

      Name                           Last modified       Size  

[DIR] Parent Directory 30-May-2017 14:31 - [   ] evg1dapsim-refdadrtr.blastp.gz 06-Apr-2017 21:14 129M [   ] evg1dapsim.names.gz 06-Apr-2017 21:17 2.5M [TXT] 20-Mar-2017 20:43 1k [   ] evg1dapsim.trclass.gz 20-Mar-2017 17:57 12.3M [DIR] evigene_methods/ 30-May-2017 14:19 - [DIR] inputset/ 30-May-2017 13:51 - [DIR] okayset/ 30-May-2017 13:51 - [DIR] publicset/ 30-May-2017 13:46 - [TXT] rna_assemblies.aastat.txt 20-Mar-2017 16:44 1k [TXT] rnasra_daphsim16huau.csv 18-Mar-2017 16:23 1k [TXT] 21-Mar-2017 08:09 2k

Gene assembly for Daphnia_similoides water flea from RNA-Seq with EvidentialGene methods 
This is "reference-free", no chromosomes used nor are other species genes used to 
assemble daphnia genes.

RNA source is from NCBI SRA, listed in rnasra_daphsim16huau.csv

Four gene assembly runs, each multi-kmer, with velvet/oases and idba_tran, are done, 
summarized in rna_assemblies.aastat.txt

Those inputs of 3 million transcripts to of evigene are reduced to
157459 non-redundant transcripts, comprising 46,000 putative coding gene loci.
tr2aacds result is the class table evg1dapsim.trclass.gz, and the intermediate 
classified gene set of okayset/, summarized in

These are inputs then for two further steps, reference protein blast and public 
annotated sequence set, using evigene scripts and

evgmrna2tsa uses the reference protein scores and other coding metrics to reclassify
and remove some redundant or fragment transcripts, resulting in 
31,000 putative loci with alternates (main class), plus 95,000 alternates,
and 7,000 loci without alternates (noclass).  Removed were 23,000 
redundant/fragment/nohomology transcripts.

Run information is summarized in, with scripts and logs in folder evigene_methods/
Final public sequence set, with names from reference protein blast, are in publicset/
Intermediate transcript assemblies are not provided here.

Developed at the Genome Informatics Lab of Indiana University Biology Department