Index of /EvidentialGene/daphnia/daphnia_similoides/evg1dapsim
Name Last modified Size
Parent Directory 30-May-2017 14:31 -
evg1dapsim-refdadrtr.blastp.gz 06-Apr-2017 21:14 129M
evg1dapsim.names.gz 06-Apr-2017 21:17 2.5M
evg1dapsim.tr2aacds.info 20-Mar-2017 20:43 1k
evg1dapsim.trclass.gz 20-Mar-2017 17:57 12.3M
evigene_methods/ 03-Nov-2017 16:37 -
inputset/ 30-May-2017 13:51 -
okayset/ 30-May-2017 13:51 -
publicset/ 30-May-2017 13:46 -
rna_assemblies.aastat.txt 20-Mar-2017 16:44 1k
rnasra_daphsim16huau.csv 18-Mar-2017 16:23 1k
runevg.info 21-Mar-2017 08:09 2k
Gene assembly for Daphnia_similoides water flea from RNA-Seq with EvidentialGene methods
This is "reference-free", no chromosomes used nor are other species genes used to
assemble daphnia genes.
RNA source is from NCBI SRA, listed in rnasra_daphsim16huau.csv
Four gene assembly runs, each multi-kmer, with velvet/oases and idba_tran, are done,
summarized in rna_assemblies.aastat.txt
Those inputs of 3 million transcripts to tr2aacds.pl of evigene are reduced to
157459 non-redundant transcripts, comprising 46,000 putative coding gene loci.
tr2aacds result is the class table evg1dapsim.trclass.gz, and the intermediate
classified gene set of okayset/, summarized in evg1dapsim.tr2aacds.info.
These are inputs then for two further steps, reference protein blast and public
annotated sequence set, using evigene scripts run_evgaablast.sh and run_evgmrna2tsa.sh
evgmrna2tsa uses the reference protein scores and other coding metrics to reclassify
and remove some redundant or fragment transcripts, resulting in
31,000 putative loci with alternates (main class), plus 95,000 alternates,
and 7,000 loci without alternates (noclass). Removed were 23,000
Run information is summarized in runevg.info, with scripts and logs in folder evigene_methods/
Final public sequence set, with names from reference protein blast, are in publicset/
Intermediate transcript assemblies are not provided here.