Coral transcriptome annotations April-July 2014 ================================================ The transcriptoime fasta file contains contigs (or "isotigs") representing splice variants of genes. The gene identifier is the "isogroup": an isogroup is a collection of splice variants of the same gene. Most isogroups contain just a single contig. Isogroup IDs are listed in fasta headers and also in a seq2iso lookup table. All the annotations are performed on per-isogroup basis. There are two main sources of annotation: blastx matches to N.vectensis proteome (FilteredModels, version 1) evalue cutoff for annotation transfer 1e-4 => KOGClass => GO => gene names (= KOG definitions) blastx matches to A.digitifera proteome, annotations from at ZoophyteBase filtered for match length (>60) and e-value (<1e-20) Evalue cutoff for annotation transfer 1e-4 => KEGG => GO => gene names Additional manual iso2gene annotation based on tblastn (e=1e-4) with hand-picked galaxins, small cysteine-rich proteins, GFP-like fluorescent proteins, and DNA methyl transferases. Final GO tables combined (iso2go): pileup(N.vect + A.dig) + manual (see addGO_manually.R) Piled up gene names (iso2gene): manual ; N.vec ; A.dig Resulting lookup tables (2-column tab-delimited): seq2iso : sequence to isogroup iso2gene : gene names iso2go : GO terms iso2kogClass : KOG class defog_iso2go, defog_iso2kogClass : same tables as above but without genes that got FOG (“fuzzy orthologous group”) annotation in KOG definition iso2kegg : KEGG terms, lifted from A.digitifera matches "defog" tables are preferable because they are more conservative: the proteins with FOG in their definitions consist of common domains that are found is a variety of different proteins and therefore cannot be functionally annotated with confidence. In addition to the main fasta file contig, isogroup, and gene names, there are two fasta files produced by CDS_extractor_v2.pl (a translator script based on observed blastx hits): _PRO.fas : translations based on blastx hits (merging multiple HSPs) _CDS.fas : coding sequences corresponding to the translation. These files are generated based on blastx matches to the combined protein collection from N.vectensis and A.digitifera.