May 2, 2020
In this post, we will continue building on knowledge from we learnt from the last post by looking into what DNA replication is, the process behind it and how we can create our own complementary strands of DNA on the computer using Python. This one has a lot to get through, but we’ll get through it 👌.
What is DNA replication?
DNA replication is the semi-conservative duplication of DNA and thus of genetic information, allowing cells to produce two genetically identical daughter cells. Each DNA copy contains one new synthesized strand and one original strand (Figure 1). New strands can only be synthesized in the 5’ to 3’ end direction, as adding nucleotides requires the free -OH group at the 3’ end of a strand. Replication must be highly accurate to maintain the integrity of the encoded genetic information and keep mutations as low as possible (roughly 1 mistake per 10⁹ nucleotides occur but can vary among different species and parts of the genome) ⁽¹⁾, this is a huge topic all on its own; as many molecular mechanisms work together to accomplish this. Replication occurs mostly during the S phase (synthesis phase) of interphase in the cell cycle, between the G₁ and G₂ phases ⁽²⁾.
Prokaryotic bacterium E.coli has circular DNA and replicates at up to 1,000 nucleotides per second as each replication only lasts around 40 minutes. Compared to eukaryotes that replicate at about 40 nucleotides per second ⁽³⁾, the length of cell replication cycles varies between different species.
How DNA replicates in eukaryotes
First, let’s go over at how DNA replication is initiated.
Regulation of DNA replication
It is crucial to control DNA replication, as the genome must only be replicated once per cell cycle or this can lead to mutations. This is accomplished by tight regulation by CDKs (cyclin-dependent kinases) and initiator proteins e.g. Cdc6 and Cdt1 (Figure 2).
Initiation of DNA replication is biphasic:
Replication selection: formation of the pre-replicative complex (pre-RC), made of initiator proteins on replicator sequences/origin of replication of DNA. (Can only occur in G₁ phase when CDK activity levels are low).
Origin activation: existing pre-RC can activate, unwinding the DNA and recruitment of DNA polymerases (can only occur in S phase when CDK activity levels are high).
During replication selection, first the Origin Replication Complex (ORC)* binds to replicator sequences on DNA ⁽⁴⁾ and then recruits helicase loading proteins Cdc6 (cell division cycle 6) ⁽⁵⁾, Cdt1 (Chromatin licensing and DNA replication factor 1) and Mcm2-7(minichromosome maintenance protein complex 2-7) ⁽⁶⁾ to form the pre-RC. Subunits Orc1-5 of ORC bind to DNA ⁽⁷⁾, Orc1 also binds to Cdc6 ⁽⁸⁾, and Orc6 is required to bind to Cdt1 and recruit Mcm2-7 ⁽⁹⁾ (Figure 3).
Once the pre-RC is formed and CDK activity levels are high again, existing pre-RCs can activate. Cdc45 is recruited and forms a complex with Mcm2-7 causing double-stranded DNA to unwind ⁽¹⁰⁾ forming replication forks (Figure 5). Proteins such as RPA (Replication protein A) and DNA polymerases α and ɛ ⁽¹¹⁾⁽¹²⁾ are also recruited in preparation for synthesis.
DNA strand synthesis
Now we know how DNA replication is initiated, we can move on to how DNA is synthesized. One of the main enzymes in the process is DNA polymerase which has a few key features:
Can only add nucleotides at the 3’ end of DNA, as a nucleotide with the free -OH group on the 3’ end is needed for the formation of a phosphodiester bond.
Has a high processivity in which it tends to continue to polymerise and synthesize polynucleotide chains in the presence of a template strand and deoxyribonucleotides, rather than stop and fall off.
Can’t start making a DNA strand from the start, so it needs a short pre-made strand called a primer which it can continue synthesizing from.
The process of DNA synthesis:
As DNA polymerase can’t start synthesis without a primer, DNA primase lays down a small strand of RNA primer.
The binding of DNA polymerase to the template strand and processivity is enhanced by a sliding clamp protein which is bound to DNA by forming a complex with a clamp loader (Figure 4). The complex is then able to bind to the primer-template junction at the replication fork. ATP hydrolysis causes the clamp loader to dissociate ⁽¹³⁾, leaving the sliding clamp to sit behind DNA polymerase and help it move and stay bound along with the template.
DNA polymerase must move towards the replication fork to synthesize the newly exposed strand by helicase (Mcm2-7), which is perfect for one strand as it can allow DNA polymerase to synthesize a complementary strand in the 3’ end direction, called the leading strand. The other strand is called the lagging strand, which is synthesized in short strands called Okazaki fragments ⁽¹⁴⁾ (Figure 5), DNA polymerase stops synthesis when it reaches another RNA primer.
Ribonuclease H removes the RNA primer and is replaced with DNA by DNA polymerase, but there is still a small gap left.
These gaps left between adjacent DNA fragments are nick sealed by DNA ligase using ATP hydrolysis in a 2 step catalytic reaction called DNA polymerization.
Extra proteins involved in synthesis you should know:
Topoisomerases prevent DNA from being tangled and release superhelical tension created at the replication fork as DNA is unwound by helicase.
Single-stranded binding proteins (SSBs) expose single-stranded DNA at the replication fork preventing bases from binding to each other and forming “hairpins”.
Now we have a basic understanding of what DNA replication is, how it’s regulated, and the process behind it, we can now finally move on to the coding section! 👏.
We know DNA is always read from the 5’ to 3’ end. So if I asked you to write the complementary strand for 5’ ATGTTCAAA 3’ in python, most people might write “TACAAGTTT” which is usually correct in biology-related exams, but is wrong when using python. This is because on computers we must always write strands from the 5’ to 3’ end. If “TACAAGTTT” was used to figure out what protein it translates into the results wouldn’t be right as it’s written 3’ to 5’, thus it will be read in the wrong direction. The correct answer would be to use the reverse complement, which is the complement but just reversed. The following should help, it took me a moment to wrap my head around it.
original = 5' ATGTTCAAA 3' complement = 3' TACAAGTTT 5' reverse complement = 5' TTTGAACAT 3'
Let’s write a function that will do this for us, always returning the right answer. Making it easier for us to work with DNA, for when we go into transcription and translation in future posts.
Calculating reverse complements
You should use the open_and_parse_fasta() function we made in the last post to get a sequence. I’ll just be using a short sequence 5’ “ATGCCGTGGTAAAGCCTTAAG” 3’ to explain the function we create (I’ve added the full code using a fasta file to GitHub). Let me know what sequences you tested the function on in the comments 👍.
# Function takes in a string DNA sequence and returns the reverse complement # Loops through sequence 4 times, each loop replacing a different base # 1st loop replaces 'A' with 't' e.g. "tTGCCGTGGTtttGCCTTttG" # 2nd loop replaces 'T' with 'a' e.g. "taGCCGaGGatttGCCaattG" etc.. # Left with "tacggcaccatttcggaattc" so must change to uppercase # Finally reverse the complement string def reverse_complement_DNA(dna_sequence): reverse_complement = dna_sequence.replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c').upper()[::-1] print(reverse_complement) if __name__ == "__main__": reverse_complement_DNA("ATGCCGTGGTAAAGCCTTAAG")
This gives us the following result if everything is working correctly.
Congratulations! you’ve made it through another post 🎉. It’s such a simple function for such a complex mechanism 😅. But we’re finally ready to move on to transcription, translation and the encoding of proteins in the next post! If there are any questions or you have feedback on how to improve the code you can post them in the comments below.
Wilson, J. and Hunt, T., 2002. Molecular Biology Of The Cell, 4Th Edition. New York: Garland Science, pp.263-265. Available at: https://www.ncbi.nlm.nih.gov/books/NBK26881/ [Accessed 30 April 2020].
Takeda, D. and Dutta, A., 2005. DNA replication and progression through S phase. Oncogene, [online] 24(17), pp.2827-2843. Available at: https://www.nature.com/articles/1208616 [Accessed 30 April 2020].
Pray, L., 2008. Molecular Events Of DNA Replication. [online] Nature.com. Available at: https://www.nature.com/scitable/topicpage/major-molecular-events-of-dna-replication-413/ [Accessed 30 April 2020].
Bell, S. and Stillman, B., 1992. ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex. Nature, [online] 357(6374), pp.128-134. Available at: https://www.ncbi.nlm.nih.gov/pubmed/1579162 [Accessed 1 May 2020].
Speck, C., Chen, Z., Li, H. and Stillman, B., 2005. ATPase-dependent cooperative binding of ORC and Cdc6 to origin DNA. Nature Structural & Molecular Biology, [online] 12(11), pp.965-971. Available at: https://www.ncbi.nlm.nih.gov/pubmed/16228006/ [Accessed 1 May 2020].
Tsakraklides, V. and Bell, S., 2010. Dynamics of Pre-replicative Complex Assembly. Journal of Biological Chemistry, [online] 285(13), pp.9437-9443. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843193/ [Accessed 1 May 2020].
Stillman, B. and Li, H., 2012. The origin recognition complex: a biochemical and structural view. Subcell Biochem, [online] 62, pp.37-58. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779782/ [Accessed 1 May 2020].
Capaldi, S., 2004. Biochemical characterization of Cdc6/Orc1 binding to the replication origin of the euryarchaeon Methanothermobacter thermoautotrophicus. Nucleic Acids Research, [online] 32(16), pp.4821-4832. Available at: https://www.ncbi.nlm.nih.gov/pubmed/15358831/ [Accessed 1 May 2020].
Chen, S., de Vries, M. and Bell, S., 2007. Orc6 is required for dynamic recruitment of Cdt1 during repeated Mcm2 7 loading. Genes & Development, [online] 21(22), pp.2897-2907. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2049192/ [Accessed 1 May 2020].
Takisawa, H., Mimura, S. and Kubota, Y., 2000. Eukaryotic DNA replication: from pre-replication complex to initiation complex. Current Opinion in Cell Biology, [online] 12(6), pp.690-696. Available at: https://www.ncbi.nlm.nih.gov/pubmed/11063933 [Accessed 1 May 2020].
Walter, J. and Newport, J., 2000. Initiation of Eukaryotic DNA Replication. Molecular Cell, [online] 5(4), pp.617-627. Available at: https://www.ncbi.nlm.nih.gov/pubmed/10882098 [Accessed 1 May 2020].
Nishitani, H. and Lygerou, Z., 2002. Control of DNA replication licensing in a cell cycle. Genes to Cells, [online] 7(6), pp.523-534. Available at: https://onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2443.2002.00544.x [Accessed 1 May 2020].
Kelch, B., Makino, D., O’Donnell, M. and Kuriyan, J., 2012. Clamp loader ATPases and the evolution of DNA replication machinery. BMC Biology, [online] 10(1). Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3331839/ [Accessed 1 May 2020].
Okazaki, R., Okazaki, T., Sakabe, K., Sugimoto, K. and Sugino, A., 1968. Mechanism of DNA chain growth. I. Possible discontinuity and unusual secondary structure of newly synthesized chains. Proceedings of the National Academy of Sciences, [online] 59(2), pp.598-605. Available at: https://www.pnas.org/content/59/2/598 [Accessed 1 May 2020].
(*) ORC: is a complex made of six protein sub-units Orc1-6, binds to replicator sequence and recruits Cdc6, Cdt1 and Mcm2-7.
Full code on Github.