Progress Report on D. ananassae fosmid 1475K17: Ten-m

Paul Szauter - June 24, 2013


1. Retrieval of D. melanogaster Ten-m CDS's from Gene Record Finder and BLASTX results
2. CDS 11
3. CDS 10
4. CDS 9
5. CDS 8
6. CDS 7
7. CDS 6
8. CDS 5
9. CDS 4
10. CDS 3
11. CDS 2
12. CDS 1
13. Summary of results
14. Alignments to D. melanogaster
15. Comparison of genomic sequences from D. ananassae and D. melanogaster CDS 9

Retrieval of D. melanogaster Ten-m CDS's from Gene Record Finder and BLASTX results

The D. ananassae Ten-m gene is located on the minus strand of fosmid 1475K17.

The Gene Record Finder gives thirteen CDS segments for D. melanogaster Ten-m:

>Ten-m:13_536_0
MNPYEYESTLDCRDVGGGPTPAHAHPHAQGRTLPMSGHGRPTTDLGPVHG
SQTLQHQNQQNLQAAQAAAQSSHYDYEYQHLAHRPPDTANNTAQRTHGRQ


>Ten-m:12_536_2
FLLEGVTPTAPPDVPPRNPTMSRMQNGRLTVNNPNDADFEPSCLVRTPSG
NVYIPSGNL


>Ten-m:11_536_2
INKGSPIDFKSGSACSTPTKDTLKGYERSTQGCMGPVLPQRSVMNGLPAH
HYSAPMNFRKDLVARCSSPWFGIGSISVLFAFVVMLILLTTTGVIKWNQS
PPCSVLVGNEASEVTAAKSTNTDLSKLHNSSVRAKNGQGIGLAQGQSGLG
AAGVGSGGGSSAATVTTATSNSGTAQGLQSTSASAEATSSAATSSSQSSL
TPSLSSSLANANNG


>Ten-m:10_602_2
RAMSTRLSVRGAGERGRRHRRSLNEEQGEDDVATDGTFSDLITNESLNQQ
AAEKYLATTLAKSPTDVHGSGNKTLPRMDGVYGTQRSEDTPDTSYDYVYE
DEVEPETTPSLIRRTKTGQQFGKSLNSNLRSAAKTLVNKRRKYDHGTVEA
EHIKHEEEEEEDEQKLERHEAIGMATELTTESETSTLPAVIDDDNQSDNS
SSGPTPETTVRSDTDDIVEINTPPSQTAQRTFAAVSHQPAIEHDFQIKGT
DAGGLQTEKPATDDINNERDLADNYEVDSKEPTSPGTPPQ


>Ten-m:9_602_2
KVSQQTGKASLQSLQSESDLMMNDASHYEDIDIVKLDGLTISHEEEIYKT
ADKENMAPKNQPSQHIDRSQNEVLKGHQQGDEKQPQLEPLKPYVSERVDL
PGKRIFLNLTIATDEGSDSVYTLHVEVPTGGGPHFIKEVLTHEKPTAQAD
SCVPEPPPRMPDCPCSCLPPPAPIYLDDTVDIDSAPPAKTVTTSTISAPI
NPFHSEEEDDEDGVRDEEQTPSSSTATNLPSTEIDNHIAAFTEPAVGAGG
VPFACPDVMPVLILE


>Ten-m:8_536_2
ARTFPARSFPPDGTTFGQITLGQKLTKEIQPYSYWNMQFYQSEPAYVKFD
YTIPRGASIGVYGRRNALPTHTQYHFKEVLSGFSASTRTARAAH


>Ten-m:7_536_0
LSITREVTRYMEPGHWFVSLYNDDGDVQELTFYAAVAEDMTQNCPNGCSG
NGQCLLGHCQCNPGFGGDDCSESVCPVLCSQHGEYTNGECICNPGWKGKE
CSLRHDECEVADCSGHGHCVSGKCQCMRGYKGKFCEE


>Ten-m:6_536_2
DCPHPNCSGHGFCADGTCICKKGWKGPDCATMDQDALQCLPDCSGHGTFD
LDTQTCTCEAKWSGDDCSKELCDLDCGQHGRCEGDACACDPEWGGEYCNT
RLCDVRCNEHGQCKNGTCLCVTGWNGKHCTIEGCPNSCAGHGQCRVSGEG
QWECRCYEGWDGPDCGIALELNCGDSKDNDK


>Ten-m:5_536_2
GLVDCEDPECCASHVCKTSQLCVSAPKPIDVLLRKQPPAITASFFERMKF
LIDESSLQNYAKLETFNE


>Ten-m:4_602_1
IFWNYFNA


>Ten-m:3_536_1
RSAVIRGRVVTSLGMGLVGVRVSTTTLLEGFTLTRDDGWFDLMVNGGGAV
TLQFGRAPFRPQSRIVQVPWNEVVIIDLVVMSMSEEKGLAVTTTHTCFAH
DYDLMKPVVLASWKHGFQGACPDRSAILAESQVIQESLQIPGTGLNLVYH
SSRAAGYLSTIKLQLTPDVIPTSLHLIHLRITIEGILFERIFEADPGIKF
TYAWNRLNIYRQRVYGVTTAVVKVGYQYTDCTDIVWDIQTTKLSGHDMSI
SEVGGWNLDIHHRYNFHEGILQKGDGSNIYLRNKPRIILTTMGDGHQRPL
ECPDCDGQATKQRLLAPVALAAAPDGSLFVGDFNYIRRIMTDGSIRTVVK
LNATRVSYRYHMALSPLDGTLYVSDPESHQIIRVRDTNDYSQPELNWEAV
VGSGERCLPGDEAHCGDGALAKDAKLAYPKGIAISSDNILYFADGTNIRM
VDRDGIVSTLIGNHMHKSHWKPIPCEGTLKLEEMHLRWPTELAVSPMDNT
LHIIDDHMILRMTPDGRVRVISGRPLHCATASTAYDTDLATHATLVMPQS
IAFGPLGELYVAESDSQRINRVRVIGTDGRIAPFAGAESKCNCLERGCDC
FEAEHYLATSAKFNTIAALAVTPDSHVHIADQANYRIRSVMSSIPEASPS
REYEIYAPDMQEIYIFNRFGQHVSTRNILTGETTYVFTYNVNTSNGKLST
VTDAAGNKVFLLRDYTSQVNSIENTKGQKCRLRMTRMKMLHELSTPDNYN
VTYEYHGPTGLLRTKLDSTGRSYVYNYDEFGRLTSAVTPTGRVIELSFDL
SVKGAQVKVSENAQKEMSLLIQGATVIVRNGAAESRTTVDMDGSTTSITP
WGHNLQMEVAPYTILAEQSPLLGESYPVPAKQRTEIAGDLANRFEWRYFV
RRQQPLQAGKQSKGPPRPVTEVGRKLRVNGDNVLTLEYDRETQSVVVMVD
DKQELLNVTYDRTSRPISFRPQSGDYADVDLEYDRFGRLVSWKWGVLQEA
YSFDRNGRLNEIKYGDGSTMVYAFKDMFGSLPLKVTTPRRSDYLLQYDDA
GALQSLTTPRGHIHAFSLQTSLGFFKYQYYSPINRHPFEILYNDEGQILA
KIHPHQSGK


>Ten-m:1_536_0
VAFVHDTAGRLETILAGLSSTHYTYQDTTSLVKSVEVQEPGFELRREFKY
HAGILKDEKLRFGSKNSLASARYKYAYDGNARLSGIEMAIDDKELPTTRY
KYSQNLGQLEVVQDLKITRNAFNRTVIQDSAKQFFAIVDYDQHGRVKSVL
MNVKNIDVFRLELDYDLRNRIKSQKTTFGRSTAFDKINYNADGHVVEVLG
TNNWKYLFDENGNTVGVVDQGEKFNLGYDIGDRVIKVGDVEFNNYDARGF
VVKRGEQKYRYNNRGQLIHSFERERFQSWYYYDDRSRLVAWHDNKGNTTQ
YYYANPRTPHLVTHVHFPKISRTMKLFYDDRDMLIALEHEDQRYYVATDQ
NGSPLAFFDQNGSIVKEMKRTPFGRIIKDTKPEFFVPIDFHGGLIDPHTK
LVYTEQRQYDPHVGQWMTPLWETLATEMSHPTDVFIYRYHNNDPINPNKP
QNYMIDLDSWLQLFGYDLNNMQSSRYTKLAQYTPQASIKSNTLAPDFGVI
SGLECIVEKTSEKFSDFDFVPKPLLKMEPKMRNLLPRVSYRRGVFGEGVL
LSRIGGRALVSVVDGSNSVVQDVVSSVFNNSYFLDLHFSIHDQDVFYFVK
DNVLKLRDDNEELRRLGGMFNISTHEISDHGGSAAKELRLHGPDAVVIIK
YGVDPEQERHRILKHAHKRAVERAWELEKQLVAAGFQGRGDWTEEEKEEL
VQHGDVDGWNGIDIHSIHKYPQLADDPGNVAFQRDAKRKRRKTGSSHRSA
SNRRQLKFGELSA*


>Ten-m:2_602_0
VAFVHDTAGRLETILAGLSSTHYTYQDTTSLVKSVEVQEPGFELRREFKY
HAGILKDEKLRFGSKNSLASARYKYAYDGNARLSGIEMAIDDKELPTTRY
KYSQNLGQLEVVQDLKITRNAFNRTVIQDSAKQFFAIVDYDQHGRVKSVL
MNVKNIDVFRLELDYDLRNRIKSQKTTFGRSTAFDKINYNADGHVVEVLG
TNNWKYLFDENGNTVGVVDQGEKFNLGYDIGDRVIKVGDVEFNNYDARGF
VVKRGEQKYRYNNRGQLIHSFERERFQSWYYYDDRSRLVAWHDNKGNTTQ
YYYANPRTPHLVTHVHFPKISRTMKLFYDDRDMLIALEHEDQRYYVATDQ
NGSPLAFFDQNGSIVKEMKRTPFGRIIKDTKPEFFVPIDFHGGLIDPHTK
LVYTEQRQYDPHVGQWMTPLWETLATEMSHPTDVFIYRYHNNDPINPNKP
QNYMIDLDSWLQLFGYDLNNMQSSRYTKLAQYTPQASIKSNTLAPDFGVI
SGLECIVEKTSEKFSDFDFVPKPLLKMEPKMRNLLPRVSYRRGVFGEGVL
LSRIGGRALVSVVDGSNSVVQDVVSSVFNNSYFLDLHFSIHDQDVFYFVK
DNVLKLRDDNEELRRLGGMFNISTHEISDHGGSAAKELRLHGPDAVVIIK
YGVDPEQERHRILKHAHKRAVERAWELEKQLVAAGFQGRGDWTEEEKEEL
VQHGDVDGWNGIDIHSIHKYPQLADDPGNVAFQRDAKRKRRKTGSSHRSA
SNRRQLKFGELSA*VEAYREKLVAETKPEESDMTDEEYGDQEDYLIAVGK
MEPRDSSEALDEQIL*

I executed a bl2seq using BLASTX with the fosmid as the query sequence and the D. melanogaster CDS peptides above as subject sequences. Results are tabulated below. We already know that CDS 12 and CDS 13 are not on the fosmid, so they are omitted from the table.

D. melanogaster Ten-m CDS segments BLASTX
fosmid alignment
start end frame E identity positive
Ten-m:11_536_2 25756 25316 -2 7e-66 80% 86%
Ten-m:10_602_2 24496 23612 -2 2e-54 48% 62%
Ten-m:9_602_2 23192 22365 -1 6e-68 59% 68%
Ten-m:8_536_2 21938 21657 -1 1e-60 99% 100%
Ten-m:7_536_0 21528 21118 -3 2e-92 98% 99%
Ten-m:6_536_2 21025 20483 -2 2e-114 99% 99%
Ten-m:5_536_2 20421 20218 -3 1e-42 100% 100%
Ten-m:4_602_1 19180 19157 -2 0.036 100% 100%
Ten-m:3_536_1 18563 15240 -1 0.0 98% 99%
Ten-m:1_536_0 15175 12884 -2 0.0 96% 98%
Ten-m:2_602_0 15175 12701 -2 0.0 92% 95%

The alignment of D. melanogaster CDS 4 to the fosmid is of marginal significance because the sequences is so short.


CDS 11

We begin with CDS 11. Here is the 5' end of the segment aligning to CDS 11 in the UCSC Genome Browser.

UCSC Ten-m

There is a high acceptor site predicted at the AG at the position of the alignment to the D. melanogaster protein, matching the 5' end predicted by most gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 25,758.

Because we are in frame -2, the phase of the exon is 0.


Here is the 3' end of the segment aligning to CDS 11 in the UCSC Genome Browser.

UCSC Ten-m

There is a medium donor predicted at the GT at the position of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 25,120.

Because we are in frame -2, the phase of the next exon is 0.


CDS 10

Here is the 5' end of the segment aligning to CDS 10 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a medium acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 24,498.

Because we are in frame -2, the phase of the exon is 0.


Here is the 3' end of the segment aligning to CDS 10 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a medium donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by most gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 23,611.

Because we are in frame -2, the phase of the next exon is 0.


CDS 9

Here is the 5' end of the segment aligning to CDS 9 in the UCSC Genome Browser.

UCSC Ten-m

There are two possible splice acceptors here. One is an AG not identified as an acceptor at 23,250. There are four TopHat junctions here, and a strong species conservation signal at this position. The first base of the exon given this splice is 23,248.

The other is an AG identified as a medium acceptor at 23,202. There is a single TopHat junction here and a weaker conservation signal. The first base of the exon given this splice is 23,200.

The frame for this exon is -1. The previous exon, which we predict with confidence, gives the phase of this exon as 0. If the first base is 23,248, the phase is 0. If the first base is 23,200, the phase is also 0.


Here is the 3' end of the segment aligning to CDS 9 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a medium donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 22,364.

Because we are in frame -1, the phase of the next exon is 2.


CDS 8

Here is the 5' end of the segment aligning to CDS 8 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a medium acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 21,940.


Here is the 3' end of the segment aligning to CDS 8 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a high donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 21,657.


CDS 7

Here is the 5' end of the segment aligning to CDS 7 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a low acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 21,528.


Here is the 3' end of the segment aligning to CDS 7 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a medium donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well.The last base of this exon is 21,117.


CDS 6

Here is the 5' end of the segment aligning to CDS 6 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a high acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 21,027.


Here is the 3' end of the segment aligning to CDS 6 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a high donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 20,482.


CDS 5

Here is the 5' end of the segment aligning to CDS 5 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a high acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 20,423.


Here is the 3' end of the segment aligning to CDS 5 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a high donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 20,216.


CDS 4

Here are the both ends of the segment aligning to CDS 4 in the UCSC Genome Browser.

UCSC Ten-m

There is no alignment to D. melanogaster proteins displayed. Our BLASTX analysis gave an alignment to the eight amino acid CDS 4 at this approximate position with a score that was not significant. The gene finders all miss this segment also, but RNA-Seq (TopHat) data show this to be a real exon. The sequence conservation across five species is absolute.

There is an AG not predicted to be an acceptor at the position identified as a splice acceptor by RNA-Seq (TopHat). The first base of this exon is 19,181.

There is a GT not predicted to be a donor at the position identified as a splice donor by RNA-Seq (TopHat). The last base of this exon is 19,155.


CDS 3

Here is the 5' end of the segment aligning to CDS 3 in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a medium acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 18,564.


Here is the 3' end of the segment aligning to CDS 3 in the UCSC Genome Browser.

UCSC Ten-m

There is a GT predicted to be a high donor at the end of the alignment to the D. melanogaster protein, matching the 3' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The last base of this exon is 15,237.


CDS 1

Here is the 5' end of the segment aligning to CDS 1 (the same as for CDS 2) in the UCSC Genome Browser.

UCSC Ten-m

There is an AG predicted to be a high acceptor at the end of the alignment to the D. melanogaster protein, matching the 5' end predicted by all gene finders. The RNA-Seq data (TopHat) supports this position as well. The first base of this exon is 15,175.


Here is the 3' end of the segment aligning to CDS 1 in the UCSC Genome Browser.

UCSC Ten-m

The last base of this exon is 12,887. The coordinates of the stop codon are 12,886 - 12,884.


CDS 2

Here is the 3' end of the segment aligning to CDS 2 in the UCSC Genome Browser.

UCSC Ten-m

The last base of this exon is 12,704. The coordinates of the stop codon are 12,703 - 12,701.


Summary of results

D. melanogaster Ten-m CDS segments BLASTX Model
fosmid coords alignment
start end frame E identity positive phase start end phase of
next exon
Ten-m:11_536_2 25,756 25,316 -2 7e-66 80% 86% 0 25,758 25,120 0
Ten-m:10_602_2 24,496 23,612 -2 2e-54 48% 62% 0 24,498 23,611 0
Ten-m:9_602_2 23,192 22,365 -1 6e-68 59% 68% 0 23,248 or
23,200
22,364 2
Ten-m:8_536_2 21,938 21,657 -1 1e-60 99% 100% 2 21,940 21,657 0
Ten-m:7_536_0 21,528 21,118 -3 2e-92 98% 99% 0 21,528 21,117 2
Ten-m:6_536_2 21,025 20,483 -2 2e-114 99% 99% 2 21,027 20,482 2
Ten-m:5_536_2 20,421 20,218 -3 1e-42 100% 100% 2 20,423 20,216 1
Ten-m:4_602_1 19,180 19,157 -2 0.036 100% 100% 1 19,181 19,155 1
Ten-m:3_536_1 18,563 15,240 -1 0.0 98% 99% 1 18,564 15,237 0
Ten-m:1_536_0 15,175 12,884 -2 0.0 96% 98% 0 15,175 12,887 N/A
Ten-m:2_602_0 15,175 12,701 -2 0.0 92% 95% 0 15,175 12,704 N/A

We will use the Gene Model Checker to determine if one of the two 5' ends for CDS 9 is better. First, here is the summary of CDS usage by isoform from the Gene Record Finder:

Ten-m Gene Record Finder

We will use Ten-m-PD to check the model. Ten-m-PB lacks CDS segments CDS 10 and CDS 9. Ten-m-PE has an unusual terminal coding segment (CDS 2) in which the stop codon that ends CDS 1 is read through.

Ten-m GMC checklist

Ten-m GMC dotplot

Ten-m GMC checklist

Ten-m GMC dotplot

The checklist in the Gene Model Checker and the dotplots are good for both models. This is not surprising, as both splice acceptors (resulting in first coding base 23,248 and 23,200) are in phase 0.


Alignments to D. melanogaster

Two models

Both sequences aligned to NP_001262211.1. In the alignments, the longer form (splice at 23,248) requires a four amino acid gap to be introduced into the D. melanogaster protein to optimize the alignment. The shorter form (splice at 23,200) requires a thirteen amino acid gap to be introduced into the D. ananassae protein to optimize the alignment.

There is RNA-Seq evidence that both splice acceptors are used.

Given that isoform B (Ten-m-PB) entirely omits CDS 10 and CDS 9, these segments do not seem to be critical for protein function, so it is likely that both isoforms are made in D. ananassae. The complete protein sequences are given below.


>Ten-m-PD_peptide
INKGSPIDFKSGSACSTPTKETLKGGYDRGTQGCMGPVLPPRSVMNGLPSHHYSAPMNFR
KDLAARCSSPWVGVAAISALVLVVLMLILLTTFGGLHWTQSAPCTVLVGNEASEVTAAKS
TNTDLSKLHNSATRSKNGQAIGPVPGQYGSGGGMGSSTATVTAATSNSGTAQGLQSTSAS
AEASSSAATTSSQSSLTPSTSLSSSLANANNGGRDILTRMAADGAGKNKRRNRRSMDVAE
NGGDVATDETFSNFITIESLNREQTGEYFATTPARKLQEVERSSSDRTSFGINGVLSPQG
DEEVEDITSDYVYEDEPVPDTSPATQRPRTRQQFGKSLNSNLRSAAKTLVNKKTKYEGEA
GKNIRLEQEQKLEAFIEAGMTLESTTTTATRATTTTESGTSTFVAVIDDDNQSDSSSSGA
VPPVLTVLRSDTDDIVEINTALPEPTEGSILAPFPRSSMANDFQIKGKAVSESGQEKPAT
DDNNNERDLADNYELKPEEPAASATPLQGKEIQKDVTPSTPLTVEINKSESDFMVNDMAS
QFEDIDIVKLGEAPSSHEEAVYKTSSSKDKAVPMAPAQPEAIENEVLKDQDEARQVPLHL
RPLKPYVSETIDQPGRRILVNLTIATDDGPDNVYTLHVEVPTGGGPHTIKEVLTHEKPPQ
QADHQAENCVPEPPPRMPDCPCSCLPPPAPIYLDDRGDSGDSGSAPPVDTDSAPLASTTN
GASTSPPLETATILGDRHSEKDDHGVGNGNSSTVEGESTASSCATNTPSTEIDNHIDAFH
TEPPVGGGEPFACPDVMPVLILEGARTFPARSFPPDGTTFGQISLGQKLTKEIQPYSYWN
MQFYQSEPAYVKFDYTIPRGASIGVYGRRNALPTHTQYHFKEVLSGFSASTRTARAAHLS
ITREVTRYMEPGHWFMSLYNDDGDVQELTFYAAVAEDMTQNCPNGCSGNGQCLLGHCQCN
PGFGGHDCSESVCPVLCSQHGEYTNGECICNPGWKGKECSLRHDECEVADCNGHGHCVSG
KCQCMRGYKGKFCEEVDCPHPNCSGHGFCADGTCICKKGWKGPDCATMDQDALQCLPDCS
GHGTFDLDTQTCTCEAKWSGDDCSKELCDLDCGQHGRCEGDACACDPEWGGEYCNTRLCD
TRCNEHGQCKNGTCLCVTGWNGKHCTIEGCPNSCAGHGQCRVSGEGQWECRCYEGWDGPD
CGIALELNCGDSKDNDKDGLVDCEDPECCASHVCKTSQLCVSAPKPIDVLLRKQPPAITA
SFFERMKFLIDESSLQNYAKLETFNESIFWNYFNASRSAVIRGRVVTSLGMGLVGVRVST
TTLLEGFTLTRDDGWFDLMVNGGGAVTLQFGRAPFRPQSRIVQVPWNEVVIIDGVVMSMS
EEKGLATTTTHTCFAHDYDLMKPVVLASWKHGFQGACPDRSAILAESQVIQESLQIPGTG
LNLVYHSSRAAGYLSTIKLQLTPDNIPPTLHLIHLRITIEGILFERVFEADPGIKFTYAW
NRLNIYRQRVYGVTTAVVKVGYQYTDCTDIVWDIQTTKLSGHDMSISEVGGWNLDIHHRY
NFHEGILQKGDGSNIYLRNKPRIILTTMGDGHQRPLECPDCDGLATKQRLLAPVALAAAP
DGSLFVGDFNYIRRIMSDGSIRTVVKLNATRVSYRYHMALSPLDGTLYVSDPESHQIIRV
RDTNNYSQPELNWEAVVGSGERCLPGDEAHCGDGALAKDAKLAYPKGIAISSDNILYFAD
GTNIRMVDRDGIVSTLIGNHMHKSHWKPIPCEGTLKLEEMHLRWPTELAVSPMDNTLHII
DDHMILRMTPDGRVRVISGRPLHCATASTAYDTDLATHATLVMPQSIAFGPLGELYVAES
DSQRINRVRVIGTDGRIAPFAGAESKCNCLERGCDCFEAEHYLATSAKFNTIAALSVTPD
GHVHIADQANYRIRSVMSSIPEASPSREYEIYAPDMQEIYIFNRFGQHVSTRNILTGETT
YVFTYNVNTSNGKLSTVTDAAGNKVFLLRDYTSQVNSIENTKGQKCRLRMTRMKMLHELS
TPDNYNVTYEYHGPTGLLKTKLDSTGRSYVYNYDEFGRLTSAVTPTGRVIELSFDLSVKG
AQVKVSENAQKEQSLLIQGATVTVRNGAAESRTSVDMDGSTTSITPWGHNVQMEVAPYTI
LAEQSPLLGESYPVPAKQRTEIAGDLANRFEWRYFVRRQQPLQAGKQSKGAPRPVTEVGR
KLRVNGDNVLTLEYDRETQSVVVLVDDKQELLNVTYDRTSRPISFRPQSGDYADVDLEYD
RFGRLVSWKWGVLQEAYSFDRNGRLNEIKYGDGSTMVYAFKDMFGSLPLKVTTPRRSDYL
LQYDDAGALQSLTTPRGHIHAFSLQTSLGFFKYQYFSPINRHPFEILYNDEGQILAKIHP
HQSGKVAFVYDAAGRLETILAGLSSTHYTYQDTTSLVKTVEVQEPGFELRREFKYHAGIL
KDEKLRFGSKNSLASAHYKYAYDGNARLSGIEMAIDDKELPTTRYKYSQNLGQLEVVQDL
KITRNAFNRTVIQDSAKQFFAIVDYDQHGRVKSVLMNVKNIDVFRLELDYDLRNRIKSQK
TTFGRSTAFDKINYNADGHVVEVLGTNNWKYLYDENGNTVGVVDQGEKFNLGYDIGDRVI
KVGDVEFNNYDARGFVVRRGEQKYRYNNRGQLIHAFERERFQSWYYYDDRSRLVAWHDNQ
GNTTQYYYANPRTPHLVTHAHFPKLARTMKFFYDDRDMLIAMENADQRYYVATDQNGSPL
AFFDLNGGIAKELKRTPFGRIIKDTKPDFFVPIDFHGGLIDPHTKLIYTEQRQYDPHVGQ
WMTPQWETLATEMSHPTDVFIYRYHNNDPINPNRPQNYMIDLDAWLQLFGYDLDNMQSRR
YTKLAQYTPQASIKSNMLAPDFGVISGLECIVEKTSEKFSDFDFVPKPLLKMEPKMRNLL
PRISYRRGVFGEGVLLSRIGGRALVSVVDGSNSVVQDVVSSVFNNSYFLDLHFSIHDQDV
FYFVKDNVLKLRDDNEELRRLGGMFNISTHEVSDHGGSAAKELRLHGPDAVVIVKYGVDP
EQERHRILKHAHKRAVERAWELEKQLVAAGFQGRGDWTEEEKEELVQHGDVDGWIGIDIH
SIHKYPQLADDPGNVAFQRDAKRKRRKTGNSHRSASSRRQMKFGELSA

>Ten-m-PD_peptide
INKGSPIDFKSGSACSTPTKETLKGGYDRGTQGCMGPVLPPRSVMNGLPSHHYSAPMNFR
KDLAARCSSPWVGVAAISALVLVVLMLILLTTFGGLHWTQSAPCTVLVGNEASEVTAAKS
TNTDLSKLHNSATRSKNGQAIGPVPGQYGSGGGMGSSTATVTAATSNSGTAQGLQSTSAS
AEASSSAATTSSQSSLTPSTSLSSSLANANNGGRDILTRMAADGAGKNKRRNRRSMDVAE
NGGDVATDETFSNFITIESLNREQTGEYFATTPARKLQEVERSSSDRTSFGINGVLSPQG
DEEVEDITSDYVYEDEPVPDTSPATQRPRTRQQFGKSLNSNLRSAAKTLVNKKTKYEGEA
GKNIRLEQEQKLEAFIEAGMTLESTTTTATRATTTTESGTSTFVAVIDDDNQSDSSSSGA
VPPVLTVLRSDTDDIVEINTALPEPTEGSILAPFPRSSMANDFQIKGKAVSESGQEKPAT
DDNNNERDLADNYELKPEEPAASATPLQEINKSESDFMVNDMASQFEDIDIVKLGEAPSS
HEEAVYKTSSSKDKAVPMAPAQPEAIENEVLKDQDEARQVPLHLRPLKPYVSETIDQPGR
RILVNLTIATDDGPDNVYTLHVEVPTGGGPHTIKEVLTHEKPPQQADHQAENCVPEPPPR
MPDCPCSCLPPPAPIYLDDRGDSGDSGSAPPVDTDSAPLASTTNGASTSPPLETATILGD
RHSEKDDHGVGNGNSSTVEGESTASSCATNTPSTEIDNHIDAFHTEPPVGGGEPFACPDV
MPVLILEGARTFPARSFPPDGTTFGQISLGQKLTKEIQPYSYWNMQFYQSEPAYVKFDYT
IPRGASIGVYGRRNALPTHTQYHFKEVLSGFSASTRTARAAHLSITREVTRYMEPGHWFM
SLYNDDGDVQELTFYAAVAEDMTQNCPNGCSGNGQCLLGHCQCNPGFGGHDCSESVCPVL
CSQHGEYTNGECICNPGWKGKECSLRHDECEVADCNGHGHCVSGKCQCMRGYKGKFCEEV
DCPHPNCSGHGFCADGTCICKKGWKGPDCATMDQDALQCLPDCSGHGTFDLDTQTCTCEA
KWSGDDCSKELCDLDCGQHGRCEGDACACDPEWGGEYCNTRLCDTRCNEHGQCKNGTCLC
VTGWNGKHCTIEGCPNSCAGHGQCRVSGEGQWECRCYEGWDGPDCGIALELNCGDSKDND
KDGLVDCEDPECCASHVCKTSQLCVSAPKPIDVLLRKQPPAITASFFERMKFLIDESSLQ
NYAKLETFNESIFWNYFNASRSAVIRGRVVTSLGMGLVGVRVSTTTLLEGFTLTRDDGWF
DLMVNGGGAVTLQFGRAPFRPQSRIVQVPWNEVVIIDGVVMSMSEEKGLATTTTHTCFAH
DYDLMKPVVLASWKHGFQGACPDRSAILAESQVIQESLQIPGTGLNLVYHSSRAAGYLST
IKLQLTPDNIPPTLHLIHLRITIEGILFERVFEADPGIKFTYAWNRLNIYRQRVYGVTTA
VVKVGYQYTDCTDIVWDIQTTKLSGHDMSISEVGGWNLDIHHRYNFHEGILQKGDGSNIY
LRNKPRIILTTMGDGHQRPLECPDCDGLATKQRLLAPVALAAAPDGSLFVGDFNYIRRIM
SDGSIRTVVKLNATRVSYRYHMALSPLDGTLYVSDPESHQIIRVRDTNNYSQPELNWEAV
VGSGERCLPGDEAHCGDGALAKDAKLAYPKGIAISSDNILYFADGTNIRMVDRDGIVSTL
IGNHMHKSHWKPIPCEGTLKLEEMHLRWPTELAVSPMDNTLHIIDDHMILRMTPDGRVRV
ISGRPLHCATASTAYDTDLATHATLVMPQSIAFGPLGELYVAESDSQRINRVRVIGTDGR
IAPFAGAESKCNCLERGCDCFEAEHYLATSAKFNTIAALSVTPDGHVHIADQANYRIRSV
MSSIPEASPSREYEIYAPDMQEIYIFNRFGQHVSTRNILTGETTYVFTYNVNTSNGKLST
VTDAAGNKVFLLRDYTSQVNSIENTKGQKCRLRMTRMKMLHELSTPDNYNVTYEYHGPTG
LLKTKLDSTGRSYVYNYDEFGRLTSAVTPTGRVIELSFDLSVKGAQVKVSENAQKEQSLL
IQGATVTVRNGAAESRTSVDMDGSTTSITPWGHNVQMEVAPYTILAEQSPLLGESYPVPA
KQRTEIAGDLANRFEWRYFVRRQQPLQAGKQSKGAPRPVTEVGRKLRVNGDNVLTLEYDR
ETQSVVVLVDDKQELLNVTYDRTSRPISFRPQSGDYADVDLEYDRFGRLVSWKWGVLQEA
YSFDRNGRLNEIKYGDGSTMVYAFKDMFGSLPLKVTTPRRSDYLLQYDDAGALQSLTTPR
GHIHAFSLQTSLGFFKYQYFSPINRHPFEILYNDEGQILAKIHPHQSGKVAFVYDAAGRL
ETILAGLSSTHYTYQDTTSLVKTVEVQEPGFELRREFKYHAGILKDEKLRFGSKNSLASA
HYKYAYDGNARLSGIEMAIDDKELPTTRYKYSQNLGQLEVVQDLKITRNAFNRTVIQDSA
KQFFAIVDYDQHGRVKSVLMNVKNIDVFRLELDYDLRNRIKSQKTTFGRSTAFDKINYNA
DGHVVEVLGTNNWKYLYDENGNTVGVVDQGEKFNLGYDIGDRVIKVGDVEFNNYDARGFV
VRRGEQKYRYNNRGQLIHAFERERFQSWYYYDDRSRLVAWHDNQGNTTQYYYANPRTPHL
VTHAHFPKLARTMKFFYDDRDMLIAMENADQRYYVATDQNGSPLAFFDLNGGIAKELKRT
PFGRIIKDTKPDFFVPIDFHGGLIDPHTKLIYTEQRQYDPHVGQWMTPQWETLATEMSHP
TDVFIYRYHNNDPINPNRPQNYMIDLDAWLQLFGYDLDNMQSRRYTKLAQYTPQASIKSN
MLAPDFGVISGLECIVEKTSEKFSDFDFVPKPLLKMEPKMRNLLPRISYRRGVFGEGVLL
SRIGGRALVSVVDGSNSVVQDVVSSVFNNSYFLDLHFSIHDQDVFYFVKDNVLKLRDDNE
ELRRLGGMFNISTHEVSDHGGSAAKELRLHGPDAVVIVKYGVDPEQERHRILKHAHKRAV
ERAWELEKQLVAAGFQGRGDWTEEEKEELVQHGDVDGWIGIDIHSIHKYPQLADDPGNVA
FQRDAKRKRRKTGNSHRSASSRRQMKFGELSA


Comparison of D. melanogaster and D. ananassae genomic sequence

We would like to know if the two different splice acceptors predicted at the beginning of D. ananassae CDS 9 are possible in D. melanogaster. Decorated genomic sequences are presented below.

Dana decorated sequence Dana decorated sequence Dmel decorated sequence
D. ananassae genomic sequence
Splice acceptors in green, CDS 9 peptide in yellow.
First splice acceptor used.
D. ananassae genomic sequence
Splice acceptors in green, CDS 9 peptide in yellow.
Second splice acceptor used.
D. melanogaster genomic sequence
Splice acceptor in green, CDS 9 peptide in yellow.
First splice acceptor used, no second acceptor.

It appears that the splice acceptor that makes the shorter isoform possible in D. ananassae is not present in D. melanogaster.