ECB-FEAT-23138504

From EchinoWiki
Jump to navigation Jump to search

From Matt Glasenapp:

The ebr1 gene annotation in Spur_5.0 and Echinobase is 56,133 base pairs. The gene contains many (40+) exons spread across this region.

From previous cDNA sequencing, we know that the S. purpuratus ebr1 mRNA is 12,074 base pairs (https://www.ncbi.nlm.nih.gov/nuccore/NM_214665.1) and the mature protein is 3,712 amino acids (11,136 base pairs) (https://www.ncbi.nlm.nih.gov/protein/47551295).

However, looking at the Spur_5.0 gff/gtf annotation files, the sum of all the annotated exons is only 9,421 base pairs, and the combined length of all the annotated CDS is 8,486 base pairs.

So, there appear to be many missing exons/CDS/bases in the Spur_5.0 gene prediction. The missing data appears to be at the 5' end of the gene (ebr1 is on the (-) strand). This is evident in the Echinobase CDS Gene Model, which does not begin with a start codon.