Version 1.21

A wizard for protein display and image production


Teaching Portal

If you experience problems, or have suggestions, please give feedback at

This page will introduce you to the fundamental principles of protein and nucleic acid structure. Click on a question below to expand the answer. You can also download a printer-friendly PDF version of the protein structure section, or of the nucleic acid section.

Click on the questions below to learn about proteins.

Proteins are biological molecules performing a wide variety of functions. For instance, some proteins catalyse a reaction, i.e. they make it go faster than it normally would: those proteins are called enzymes. Other proteins transport molecules throughout the body, others yet provide structural support for cells so they have the right shape, etc.

Proteins are involved in nearly every biological process and their function is very often tightly linked to their three-dimensional structure. Therefore, it is crucial to determine the structure of a protein in order to understand fully how it works inside a cell.

The many functions of proteins are reflected by the wide variety of 3D structures they adopt. However, all proteins are made of the same constitutents: amino acids.

Proteins are polymers: similar molecules (called monomers) are repeated many times to form a chain (the polymer). The monomers making up proteins are amino acids, whose general structure is shown in Figure 1.

Amino acids (except glycine) contain a central chiral carbon, i.e. a carbon atom covalently linked to four different groups of atoms, often called carbon α (C­α). This central chiral atom is linked to an amino group and a carboxylic acid group, thus the term amino acid. It is also attached to a hydrogen atom and a side chain (sometime called R group). An amino acid’s side chain is what sets it apart from the other amino acids and is often responsible for the special chemical and biological properties of the amino acid.

Amino acid
Figure 1 | An amino acid with the amino group and the carboxylic acid group in orange and blue, respectively. The side chain is represented by the letter R and the Cα is shown by the asterisk (*).

In water at pH 7 (which is close to physiological conditions inside cells), the amino group is protonated and positively charged, while the carboxylic acid group is deprotonated and negatively charged, as show in Figure 2.

Amino acid at pH 7
Figure 2 | An amino acid in water at pH 7. R: side chain; *: chiral Cα.

An amino acid in water at pH 7 has a positively charged amino group and a negatively charged acid group, but is neutral overall (as the two charges cancel out). Such compounds are called zwitterions.

There are twenty standard side chains (R groups), thus there are twenty standard amino acids. Plants can make all of them, whereas animals can only synthesise some of them. The others must be acquired through the diet and are called essential amino acids.

There are twenty standard amino acids, which can be grouped according to their properties, as in Table 1. Each amino acid can be identified by its name (e.g. Alanine), its three-letter code (e.g. Ala) and its one-letter code (e.g. A).

Table 1 | The 20 standard amino acids making up proteins
Aliphatic side chains (non-aromatic hydrocarbons)
Alanine – Ala – A Valine – Val – V Leucine – Leu – L Isoleucine – Ile – I

Non-polar side chains
Glycine – Gly – G Proline – Pro – P Cysteine – Cys – C Methionine – Met – M

Aromatic side chains
Histidine – His – H Phenylalanine – Phe – F Tyrosine – Tyr – Y Tryptophan – Trp – W

Polar side chains
Asparagine – Asn – N Glutamine – Gln – Q Serine – Ser – S Threonine – Thr – T

Charged side chains
Aspartic acid – Asp – D Glutamic acid – Glu – E Lysine – Lys – K Arginine – Arg – R

At pH 7, aspartic acid and glutamic acid lose the proton on their side chain carboxylic acid, making them negatively charged. Lysine and arginine, on the other hand, gain a proton attached to a nitrogen atom in their side chain, making them positively charged.

Some amino acids may belong to several categories, depending on the conditions in which they find themselves. For instance, at pH 7, a small proportion of histidine molecules will have an extra hydrogen attached to one of the nitrogen atoms in the side chain, making the side chain positively charged.

Amino acids are joined together to form proteins. The covalent bond between two amino acids is the result of a condensation reaction (where water is released, as shown in Figure 3) and is called the peptide bond. Two amino acids joined together form a dipeptide and a longer chain of amino acids is called a polypeptide, or sometimes simply peptide.

Condensation reaction between alanine and valine yielding a dipeptide
Figure 3 | Formation of an alanine-valine dipeptide. A water molecule (blue) is released by the condensation reaction and the peptide bond (red) is formed. The backbone is highlighted in green.

When they are part of a peptide, amino acids are called residues. The peptide bond is formed between the carboxylic carbon of one residue and the nitrogen of the next. The chain made up by the amide nitrogen-Cα-carboxyl carbon of all the residues constitutes the backbone of the peptide (shown in green in Figure 3).

Proteins are made up of polypeptide chains, i.e. polymers of amino acids joined together. The structure of a protein can be studied at four different levels.

Primary structure

⇒ The primary structure of a protein is the sequence of the amino acids that constitute it.

Because of the nature of the peptide bond (cf. above), the backbone of a polypeptide will have a single primary amine at one end and a single carboxylic acid at the other end (at they do not take part in a peptide bond). Those ends are called the N-terminus (primary amine) and the C-terminus (carboxylic acid). The sequence of a polypeptide is always read from the N-terminus to the C-terminus (Figure 4).

Amino acid sequence
Figure 4 | The sequence of a polypeptide chain from an antibody, with the N- and the C-termini marked in blue and green respectively.

Secondary structure

The peptide bond between two residues is a single bond, but it is said to have a semi double-bond character. This means that it is particularly rigid for a single bond, forming a planar structure called the amide plane, as shown in Figure 5.

The amide planes
Figure 5 | The amide planes of a tripeptide. Each peptide bond (red) forms a planar structure, the amide plane (orange), due to is semi double-bond character. R1-3: side chains.

The angles between subsequent amide planes in a polypeptide are called torsion angles. They can only adopt certain values, and those values impose certain conformations (or folds) on the backbone.

⇒ The secondary structure of a protein is the local fold of the protein backbone.

Some of those local folds form precise, regular structures, often stabilised by hydrogen bonds. The two most common examples of secondary structure elements are α-helices and β-sheets.

In an α-helix, the polypeptide chain forms a right-handed helical structure with 3.6 residues per turn (Figure 6). The helix is stabilised by hydrogen bonds between the backbone N–H of each residue and the backbone C=O of the amino acid four residues earlier in the sequence. The core of the helix is tightly packed and all the side chains project outward.

Example of an alpha-helix
Figure 6 | Example of an α-helix containing alanine residues only. Hydrogen bonds (orange) between backbone atoms four residues apart stabilise the α-helix. Dark grey: carbon; blue: nitrogen; red: oxygen; white: hydrogen. Only the hydrogen atoms involved in hydrogen bonds are displayed.

β-sheets are the other most common type of secondary structure element, shown in Figure 7. They are also stabilised by hydrogen bonds, but between different chains, whereas in an α-helix, the hydrogen bonds are all within the same helix. Similarly to α-helices however, the hydrogen bonds stabilising β-sheets are between backbone N–H and C=O groups.

Example of a beta-sheet
Figure 7 | Example of a β-sheet. All side chain atoms are hidden for simplicity. Hydrogen bonds (orange) between backbone atoms from adjacent β-strands stabilise the β-sheet. Dark grey: carbon; blue: nitrogen; red: oxygen; white: hydrogen. Only the hydrogen atoms involved in hydrogen bonds are displayed.

Proteins also often contain regions of non-repetitive secondary structure. Those regions are called coils or loops. Although they are not as regular as α-helices or β-sheets, those regions still have a defined structure and should not be confused with the term random coil, which is used to describe a protein that has lost its secondary structure (the protein is then said to be denatured).

Some amino acids play a specific role in protein secondary structure. For instance, glycine does not have a side chain (it simply has two hydrogen atoms attached to its Cα) and is therefore able to adopt many more folds than other residues. Proline, on the other hand, has a side chain covalently attached to its backbone nitrogen, which means that it cannot adopt as many conformations as other amino acids, and often disrupts secondary structure elements or introduces kinks in α-helices.

Tertiary structure

⇒ The tertiary structure of a protein is its overall 3D arrangement: the folding of secondary structure elements and the position of side chains.

The hydrophobic effect is responsible for most of the tertiary structure of a protein: it is energetically favourable for the protein to fold and bury its hydrophobic residues within its core, away from the surrounding water.

Other bonds and interactions also help the protein fold into the correct tertiary structure. Disulphide bonds are covalent bonds between the sulphur atoms of two cysteine residues. Salt bridges are electrostatic interactions between a negatively charged side chain and a positively charged one. Hydrogen bonds and van der Waals interactions (between hydrophobic residues) are also involved in the tertiary structure. All those forces, interactions and bonds are shown in Figure 8.

Tertiary structure
Figure 8 | The forces, bonds and interactions responsible for protein tertiary structure.

Quaternary structure

Some proteins are made up of several polypeptide chains, which assemble once they have adopted their individual tertiary structures. The polypeptide chains may be identical or not: haemoglobin, for instance, has two copies of the same chain and two copies of another, different chain. Antibodies also contain four chains: two heavy chains and two light chains, as shown in Figure 9.

Some proteins are also covalently attached to a non-protein element, e.g. the haem cofactor in haemoglobin (cf. worksheet on haemoglobin).

⇒ The quaternary structure of a protein is the assembly of several polypeptide chains, and sometimes the addition of a non-protein element, to form a functional protein.

Antibody quaternary structure
Figure 9 | The quaternary structure of an antibody, comprising two heavy chains (blue and green) and two light chains (yellow and red).

The same forces, bonds and interactions responsible for tertiary structure may be involved in holding different polypeptide chains together.

When a protein structure is determined experimentally, the 3D coordinates of its constituting atoms are stored in the Protein Databank (PDB), in a PDB file. The Protein Databank is the result of a worldwide effort to collect all known structures of large biological molecules (proteins, DNA and RNA) in standardised files, allowing anyone to visualise them using tools like EzMol. Each PDB file can be easily accessed using its unique, 4-character PDB ID (e.g. 2HHB for deoxyhaemoglobin).

Different national and international entities collaborate to contribute to the global Protein Databank, including the RCSB PDB in the United States or the PDBe in Europe. They both store the same 3D coordinates, but they provide different kinds of information and annotations about each structure.

There are three main techniques for solving the structure of a protein. The first, which has contributed the most structures to the PDB, is X-ray crystallography. The protein is crystallised, and X-rays are shot at the crystals. The crystals diffract the X-rays (i.e. they change their direction), and the way those X-rays are diffracted depends directly on the structure of the protein. The resulting diffraction pattern is recorded, and the structure of the protein can be calculated from it.

The second technique is nuclear magnetic resonance. The protein is in solution (and not in a crystal) and is placed inside a magnetic field. The protein is irradiated with electromagnetic waves, which will excite the nuclei of its atoms. After a time, those nuclei relax and, in doing so, produce a signal that reveals information about the other nuclei around them. Then, all that information is pieced together to determine which atoms are near which other atoms in the protein, therefore solving the 3D structure.

The third major technique is cryo-electron microscopy, for which the 2017 Nobel Prize in Chemistry was awarded to Joachim Frank, Richard Henderson and Jacques Dubochet. The protein is neither in a crystal nor in solution, but this time in a thin layer of very cold ice. An electron microscope fires electrons at the protein sample and those electrons are scattered (i.e. deflected) when they hit the sample. This produces an image of the protein, which is recorded. This phenomenon is very similar to what happens in a ‘normal’ light microscope, except that photons (from light) are replaced by electrons, allowing the imaging of much smaller samples. Thousands of images are recorded, with the protein in all possible orientations, and they are then assembled back together to create a 3D model representing the structure of the protein.

The resolution of an experimentally determined structure is the smallest distance between two distinguishable features. For instance, if a structure has a resolution of 3Å (0.3nm or 0.0000000003m), it means that we can distinguish two atoms which are 3Å apart or more, but not if they are closer than 3Å. The higher the resolution (i.e. the smaller the number), the better the structure is.

The resolution is often expressed in Ångströms (Å), as it is the most useful unit to describe the length of covalent bonds between atoms (1Å = 10-10 m). Table 2 is a general guide of what can be seen at different resolutions.

Table 2 | Examples of features revealed at different resolutions
Resolution Features
6Å General shape of the protein and some α-helices.
4Å Backbone of the protein, secondary structure.
3.5Å Start to see side chains.
2.7Å Can see side chains and start seeing water molecules.
1.5Å Start reaching atomic resolution, where we can make out two covalently bonded carbon atoms.
1.2Å Can distinguish almost any two covalently linked atoms, except hydrogen (1.2Å is the length of a C=O double bond).

2.7Å is a good resolution for a structure solved by X-ray crystallography, but many structures now achieve much higher resolution: most structures in the PDB are between 1.8Å and 2Å resolution. With cryo-electron microscopy, it is difficult to achieve such high resolutions and 3.5Å is considered good, as it allows the visualisation of side chains.

Click on the questions below to learn about nucleic acids.

Nucleic acids are the carriers of genetic information. In all living organisms, the hereditary information is stored in deoxyribonucleic acid (DNA), which is a molecule formed by the repetition of nucleotides (making DNA a polymer). There are four different nucleotides in DNA, which form a universal code for hereditary information.

Ribonucleic acid (RNA), the other kind of nucleic acid, is a related molecule to DNA. It is also a polymer of four nucleotides, three of which are the same as in DNA while the fourth one is slightly different. It has many functions in cells, notably acting as the intermediate between DNA and proteins. Some viruses even store their genome in the form of an RNA molecule rather than DNA.

Nucleotides are the building blocks of nucleic acids: they are the monomers which, repeated many times, form the polymers DNA and RNA. Nucleotides are composed of a five-carbon sugar covalently attached to a phosphate group and a base containing nitrogen atoms. Figure 1 shows the structure of the nucleotides making up nucleic acids.

Figure 1 | The chemical structure of a nucleotides. A nucleotide comprises a five-carbon sugar molecule: deoxyribose in DNA (A) and ribose in RNA (B). The carbon atoms on the sugar molecule are numbered in red. Deoxyribose (A) is different from ribose (B) in that it lacks an –OH group at carbon 2’. The 5’-carbon atom is attached to a phosphate group (here a monophosphate in orange) and the 1’-carbon is attached to a base (blue).

The main difference between nucleotides from DNA and those from RNA is the nature of the sugar. Nucleotides making up RNA (Figure 1B) contain ribose, making them ribonucleotides. In DNA, however, the sugar lacks an –OH group at the 2’-carbon, making it deoxyribose and the corresponding nucleotides deoxyribonuleotides.

A nucleotide may contain more than one phosphate at its 5’-carbon, for instance the nucleotide adenosine triphosphate has three, as shown in Figure 2. When there is no phosphate group, the molecule is no longer called a nucleotide, but a nucleoside.

A molecule of adenosine triphosphate
Figure 2 | Adenosine triphosphate, often abbreviated to ATP.

The nucleotides making up DNA contain one of four nitrogenous bases (i.e. bases that contain nitrogen atoms). From a chemical perspective, two of those bases are purines, while the other two are pyrimidines. To each base corresponds a name (e.g. adenine), a nucleoside (e.g. adenosine) and a one-letter code (e.g. A). This information is included in Table 1.

Table 1 | The four bases of DNA. The ‘R’ represents the deoxyribose covalently attached to the base to form the nucleoside named in the third row.
Chemical structure
Base Adenine Guanine Cytosine Thymine
Nucleoside Deoxyadenosine Deoxyguanosine Deoxycytidine Deoxythymidine
Letter A G C T
Type of base Purine Purine Pyrimidine Pyrimidine

As mentioned above, the sugar in RNA is ribose rather than deoxyribose. However, there is another difference between DNA and RNA in the nucleotide composition. RNA contains three of the bases found in DNA (adenine, guanine and cytosine) but thymine is replaced by the related base, uracil. The four bases found in RNA, along with the names of their corresponding nucleosides, are in Table 2.

Table 2 | The four bases of RNA. The ‘R’ represents the ribose covalently attached to the base to form the nucleoside named in the third row.
Chemical structure
Base Adenine Guanine Cytosine Uracil
Nucleoside Adenosine Guanosine Cytidine Uridine
Letter A G C U
Type of base Purine Purine Pyrimidine Pyrimidine

DNA is predominantly found as a double helix: two strands of polynucleotides wind about the same axis to form a right-handed helix. Each nucleotide provides a ribose and a phosphate to the backbone. The bases project towards the centre of the helix, away from the surrounding water. The DNA double helix is shown in Figure 3.

The DNA double helix
Figure 3 | The double-helical structure of DNA. A. DNA shown as a cartoon. B. DNA shown as sticks, with a cyan cartoon highlighting the sugar-phosphate backbone. Green: base pair; grey: carbon; red: oxygen; blue: nitrogen; white: hydrogen; orange: sulphur.

Two bases (each from a different strand) come together to form a base pair, shown in green in Figure 3A. A base pair is held together by hydrogen bonds between the two bases (cf. Watson-Crick base pairing explained below).

DNA can adopt slightly different kinds of 3D structure, but the majority of the DNA inside a cell at any given point will have the structure shown in Figure 3, called B-DNA. It has 10 base pairs per helical turn and a rise of 3.4Å per base pair.

The double helix shown in Figure 3 can only accommodate two kinds of base pairs, due to the geometry of the bases. Adenine and thymine bases always pair with each other while guanine and cytosine bases always pair with each other. This kind of pairing, called Watson-Crick base pairing, is mediated by hydrogen bonds between the two bases of a pair, as shown in Figure 4.

AT and CG base pairs
Figure 4 | A. Watson-Crick base pairing between deoxyriboadenosine monophosphate and deoxyribothymidine monophosphate. B. Watson-Crick base pairing between deoxyribocytidine monophosphate and deoxyriboguanosine monophosphate. Only the name of the base is given below each nucleotide. The hydrogen bonds are shown by orange dotted lines. Grey: carbon; red: oxygen; blue: nitrogen; white: hydrogen; orange: sulphur.

Note that an AT base pair is only held by two hydrogen bonds whereas a CG base-pair has three, making the latter more stable.

A strand of DNA is the result of the polymerisation of several nucleotides, with the backbone formed by the deoxyribose sugars and the phosphate groups. Each nucleotide residue (i.e. a nucleotide within a strand of DNA) contains a phosphate group covalently attached to the 5’-carbon of its deoxyribose, but also has its deoxyribose 3’-carbon covalently attached to the phosphate of the next nucleotide residue in the strand. The only exception is the final nucleotide, which does not have a phosphate at its 3’-carbon (of the deoxyribose), but rather a free –OH group. We define this end of the strand as the 3’-end. The very first nucleotide residue, on the other hand, has a free phosphate group attached to its 5’-carbon. We define that end of the strand as the 5’-end.

DNA is always read from the 5’-end to the 3’-end, as shown in Figure 5.

Short stretch of DNA highlighting the directionality of the molecule
Figure 5 | The directionality of DNA. A stretch of 3 nucleotide residues is shown with their 5’- and 3’-carbons numbered. In red are the 5’-end (characterised by a free phosphate group) and the 3’-end (characterised by a free –OH group).

Note that, when studying DNA in the lab, it is common to remove the phosphate at the 5’-end, therefore many experimentally determined structures will actually show an –OH group rather than a phosphate at the 5’-end.

As mentioned above, RNA is made of ribonuleotides rather than deoxyribonucleotides: the 2’-carbon of its ribose is covalently attached to an –OH group. Furthermore, RNA contains the base uracil instead of thymine.

The other main difference between RNA and DNA is that RNA is often single-stranded and does not form the regular double-helical structure of DNA. However, it is quite common for a single RNA strand to fold on itself and to form complex 3D structures, with some helical character. When that is the case, the 3D structure is often stabilised by the same Watson-Crick base-pairing as in DNA, although some deviations may be allowed (often disrupting helices).

The directionality of RNA, however, is the same as that of DNA: the sequence is read from the 5’-end to the 3’-end.

Below are worksheets and examples of structures highliting some key features explained above. You will also find examples of figures generated with EzMol.


Function of haemoglobin

The Protein Databank (PDB) has a good description of the function of haemoglobin, available here. Using this information and/or other resources of your choice, answer the following questions:

In which tissue is haemoglobin found? In which cells?

What is its function?

What do the ‘oxy-‘ and ‘deoxy-’ prefixes mean when referring to oxyhaemoglobin and deoxyhaemoglobin, respectively?

Structure of deoxyhaemoglobin

Open the EzMol start page. To visualise the structure of deoxyhaemoglobin, load PDB 2HHB.

What organism is the protein from?

What is the resolution of the structure? Explain briefly the concept of resolution in protein structure.

Describe the quaternary structure of haemoglobin by answering the following questions:

How many chains are there in the protein? What are the names of those chains?

What is the cofactor of haemoglobin? What metal ion does it contain? (You may use other online resources to find this information).

Colour the different chains of haemoglobin and produce a figure highlighting each chain and the position of each cofactor, as in Figure 1.

Deoxyhaemoglobin chains
Figure 1 | The four chains of deoxyhaemoglobin, each in a different shade of blue. The haem cofactors are shown as spheres.

Colour the surface of the protein according to hydrophobicity, using a colour scheme that conveys the information clearly, such as in Figure 2.

Deoxyhaemoglobin coloured by hydrophobicity
Figure 2 | Deoxyhaemoglobin shown as a surface coloured according to local hydrophobicity. White: high hydrophobicity; light blue: intermediate hydrophobicity; dark blue: low hydrophobicity (hydrophilic patches).

Do you think haemoglobin is found in a hydrophobic or hydrophilic environment?

Describe the secondary structure of deoxyhaemoglobin by answering the following questions:

What secondary structure elements are present in the protein?

How many α-helices are present in the α- and β-chains? How many coil regions?

What interactions stabilise secondary structure elements?

Generate a figure highlighting the secondary structure elements of an α-chain and a β-chain. To do so, hide the other α and β-chains and use different colours for each type of secondary structure.

You can move on to the more advanced worksheet on oxygen binding to haemoglobin. You are advised to keep a tab open with PDB 2HHB as you will need to compare oxyhaemoglobin with the deoxyhaemoglobin structure you have just studied.

Download a PDF version of this worksheet or the answers.


This advanced worksheet requires the completion of the other haemoglobin worksheet first.

Load PDB 1HHO, which contains oxyhaemoglobin, and PDB 2HHB (in another tab), which contains deoxyhaemoglobin. Although PDB 1HHO only contains two chains, oxyhaemoglobin is made up of the same four chains as deoxyhaemoglobin in vivo.

The binding of molecular oxygen (O2) to haemoglobin results in changes in the structure of the protein, termed conformational changes. Conformational changes are key to the function of haemoglobin and determine its affinity for O2 (i.e. how easily O2 binds). Here, we will attempt to identify how those changes occur.

Compare the secondary structure of oxy- and deoxy-haemoglobin by answering the questions below:

Does the secondary structure of the protein change upon O2 binding?

Display the side chains of His-58 and His-87. Those two residues are important for the function of the protein. One of them is called proximal (i.e. nearby) histidine while the other is called distal (i.e. distant) histidine, relative to the haem group.

Which residue is the proximal histidine and which is the distal one?

Examine the position of the O2 molecule.

Which part of the haem binds O2?

The distal histidine is known to bind and stabilise O2.

What type of interaction may be responsible?

Examine the haem in oxy- (1HHO) and deoxyhaemoglobin (2HHB), as shown in Figure 3. One of them is truly planar (i.e. it lies completely flat) while the other is not (you may want to display His-58 and His-87 in deoxyhaemoglobin to make the comparison easier).

Heam from PDB 2HHB and PDB 1HHO
Figure 3 | Binding of O2 to haemoglobin. A. O2-free haem with His-58 and His-87 highlighted (PDB 2HHB). B. O2-bound haem with His-58 and His-87 highlighted (PDB 1HHO).

Which haem group is truly planar (free haem or O2-bound haem)?

Which atom lies out of the plane in the non-planar haem?

Bonus question: explain why a difference in planarity is observed.

Hint: This will require some knowledge of chemistry. Binding of O2 to the iron can be thought of as an oxidation of the iron. What happens to the number of electrons around the iron nucleus? How can this be related to the size of the iron?

When the planarity of the haem changes, what happens to the proximal histidine?

How can that explain large conformational changes throughout the protein?

Generate a figure to show how haemoglobin binds O2. Include at least a view of the O2-bound haem with the two functionally relevant histidine residues (with labels).

Download a PDF version of this worksheet or the answers.

PDB collagen-edu.pdb

Collagen is the most abundant protein in humans. It is a structural protein, providing strength to tendons and elasticity to the skin. Its function is intimately linked to the structure of the protein, which you will explore here.

Structure of collagen

For the purpose of this activity, a PDB file was created by modifying the existing structure of a short part of human collagen (original PDB: 1BKV). Download the modified PDB file by right-clicking the following link and clicking on 'Save link as': collagen-edu.pdb. Load the PDB file into EzMol.

How many chains does collagen contain?

Are there any α-helices or β-strands?

Which amino acid residue is the most abundant in the sequence? How often does it come up?

What is the particularity of that residue? How is that relevant protein structure in general?

Each chain has a polyproline II-like structure. A polyproline II helix is a left-handed helix, which differs from an α-helix and is produced from the repetition of several proline residues.

The last proline in each chain is actually a hydroxyproline (proline with a hydroxyl group attached to its γ-carbon) but is shown here as proline for simplicity. In the complete protein, proline and hydroxyproline are found more often than any other residue in the two positions preceding glycine.

What is the particularity of proline and hydroxyproline? How may that explain the existence of helices that are not α-helical?

Generate a figure showing the surface of each collagen chain in a different colour, as in Figure 4. The structure of the collagen molecule has been described as a triple helix.

Figure 4 | Fragment of collagen with each chain in a different colour.

Explain the term triple helix in the context of collagen structure.

Several collagen triple helices (each of which is made up of three polypeptide chains) are packed together side by side to form fibrils. There are many types of collagen, associated with different functions. In some types of collagen, several fibrils are then packed side by side again to form fibres with very large diameters. You can learn more about collagen structure and function on the RCSB Molecule of the Month website.

Download a PDF version of this worksheet or the answers.


Open the EzMol start page. Load PDB 1BNA, containing a fragment of DNA.

DNA structure

What is the resolution of the structure? Briefly define the term resolution.

What are the two DNA ‘chains’ called?

Why is the structure of DNA described as a double helix?

Display the molecule as sticks, with stick heteroatoms coloured by element.

Produce a figure showing a single nucleotide. You can hide unwanted residues by selecting the eraser in ‘Step 5 – Add, colour or hide side-chains’. Specify the name of the nucleotide you have chosen and label the following: base (with its name), deoxyribose, phosphate. Do not use the first nucleotide in the sequence (at the 5'-end) as it lacks a phosphate (explained later in this worksheet).

Display the entire DNA molecule as sticks again. Examine the bases.

What atoms are DNA bases made of? (You can check the colour scheme for atom colouring by hovering over the question mark under the table in ‘Step 2 – Chain style’).

Base pairing

DNA bases can be separated into two categories based on their structure: purines have a double-ring structure while pyrimidines only have one ring.

Describe the base-pairing pattern of DNA (i.e. do pyrimidines only pair up with other pyrimidines, for instance)?

Each nucleotide (or letter in the sequence) only forms pairs with another specific nucleotide. This is called Watson-Crick base pairing.

What are the nucleotide pairs in Watson-Crick base-pairing? You may want to colour different nucleotides with different colours or to display labels in order to make it clearer.

Which part of the nucleotide (base, deoxyribose or phosphate group) is responsible for the specificity of Watson-Crick base-pairing?

What interactions are responsible for Watson-Crick base-pairing?

DNA backbone & directionality

DNA is directional, i.e. it is read from one end to the other. The directionality of DNA is important as the sequence ATG will not ‘mean’ the same as GTA. The direction is defined based on the chemical structure of DNA, and in particular the deoxyribose. The carbon atoms of a deoxyribose molecule are numbered in Figure 5.

Figure 5 | Deoxyribose molecule with its carbon atoms numbered (blue).

Which carbon from deoxyribose is covalently attached to the base in a nucleotide?

Which carbon is covalently attached to the phosphate group in a nucleotide?

Which carbon is covalently attached to the phosphate group of the next residue in DNA?

You may want to refer back to your first figure to distinguish between the phosphate group of a given residue and the one from the next residue.

Identify the 5'-end and the 3'-end of a strand. When studying DNA in the lab, it is common to remove the phosphate group at the 5'-end. For this reason, the structure in PDB 1BNA (as many other structures) lacks that 5'-phosphate on either strand.

Justify the names 5’-end and 3’-end based on the chemical structure of DNA.

Identify the two ends of the other DNA strand.

Are the two DNA strands parallel (i.e. their ends are aligned) or antiparallel (i.e. their ends do not align)?

What is the charge of the DNA backbone? What gives it that charge?

Download a PDF version of this worksheet or the answers.


In eukaryotes, DNA has several levels of packing, serving several purposes. This allows DNA to take up less space in the nucleus, but it also protects it from physical damage and regulates how accessible it is to proteins. For instance, when DNA is very tightly packed, the proteins that usually ‘read’ it cannot access it, making some genes inactive.

The first level of DNA packing relies on a family of proteins, called histones. Several histones come together and wrap DNA around them to form a structure called the nucleosome.

Open the EzMol start page. Load PDB 1AOI, which contains a complete nucleosome.

Describe the overall structure and shape of the nucleosome.

How many times is DNA wrapped around the histones of a single nucleosome?

Make a figure in which each type of histone protein is shown in a different colour such as is Figure 6.

Figure 6 | A complete nucleosome with the histone proteins displayed as surfaces and the DNA wrapped around them displayed as a cartoon.

How many histone proteins make up the nucleosome?

Make a figure with the surface of the histones coloured according to charge.

Figure 7 | A complete nucleosome with the histones displayed as surfaces coloured according to charge. Dark green: negatively charged residues; light green: neutral residues; yellow: positively charged residues.

How do histones bind DNA? (Hint: think about the charge of the DNA backbone and look at the figure you have just produced).

Download a PDF version of this worksheet or the answers.


When a DNA sequence needs to be ‘read’ to make new proteins, specific proteins are recruited to the right position in the DNA in order to ‘read’ it. However, for those proteins to be recruited, the right sequence must be recognised. This is the role of the TATA-binding protein (TBP): eukaryotes typically have, before their genes, the sequence T-A-T-A-T/A-A-A/T (although there are many variations), to which TBP binds. TBP then recruits the machinery that ‘reads’ genes to make more proteins.

Load PDB 1CDW, which contains TBP in complex with DNA.

Describe the secondary structure of TBP (i.e. number of α-helices, β-strands, etc.).

Which secondary structure elements in TBP interact with DNA?

Does DNA keep its double-helical structure when bound to TBP?

Display the side chains of the following residues: Arg-192, Arg-199, Arg-290, Lys-221 and Lys-312.

What role does the structure suggest for these residues? (You can display the DNA as sticks to see more clearly).

Display the side chains of Phe-193, Phe-210, Phe-284 and Phe-301.

What role does the structure suggest for these residues?

Make a figure showing the residues mentioned above (arginine, lysine and phenylalanine residues) to show how their position in the protein is key for their functions, using Figure 8 as an example.

Figure 8 | The TATA-binding protein (TBP). A. The overall protein with β-sheets in cyan and α-helices in yellow. The DNA is shown as a cartoon. B. DNA binding by TBP: the two positively charged residues labelled are in close proximity with the sugar-phosphate backbone of DNA (only one strand is displayed). C. Phe-193 and Phe-210 are inserted between DNA bases, disrupting the double-helical structure.

Download a PDF version of this worksheet or the answers.


Bacteriophages (or simply phages) are viruses infecting bacteria. Some phages have two life cycles inside bacteria: first, they go through the lysogenic life cycle, which results in the multiplication of phages without killing the bacterial host. Then, the phages enter the lytic life cycle, where they kill the bacterial host and are released into the environment, ready to infect more bacteria.

Phages need to make sure they switch from the lysogenic to the lytic cycle at the right time, and therefore that switch is tightly regulated. One of the proteins involved in that regulation is the Cro repressor, which binds phage DNA in a specific place.

Load PDB 3CRO, which contains the complex formed by Cro and DNA.

The amino acid sequence of Cro has a slightly peculiar numbering: it is based on the sequence of another, related protein, which is why the first two positions are numbered -1 and 0.

How many Cro molecules are there in the Cro/DNA complex? Which level of structure (from primary to quaternary) does this observation correspond to?

Describe the secondary structure of Cro (i.e. number of α-helices, β-strands, etc.).

Colour residues Thr-16 to Ala-36 with a different colour for each secondary structure element.

What secondary structure elements are present in the region of Thr-16 to Ala-36?

This region forms a structural motif commonly found in DNA-binding proteins: a helix-turn-helix. One of the two helices is responsible for binding to specific base pairs, and is called the recognition helix.

Based on the structure, which helix is the recognition helix, able to recognise a specific DNA sequence?

Make a figure showing the structure of the Cro helix-turn-helix motif and how it binds DNA as in Figure 9.

Cro repressor
Figure 9 | The page Cro repressor bound to DNA with one of the two monomers as a surface. The helix-turn-helix motif is highlighted in cyan (helix from Thr-16 to Thr-22), yellow (turn Lys-23 to Lys-27) and green (helix from Gln-28 to Ala-36).

Download a PDF version of this worksheet or the answers.

Teaching Portal by Tomas Voisin, Imperial College London.

EzMol interface © Structural Bioinformatics Group, Imperial College London 2018.

Please cite: Reynolds CR, Islam SA, Sternberg MJE (2018). “EzMol: A web server wizard for the rapid visualisation and image production of protein and nucleic acid structures.” J Mol Biol [Online paper] [Import into BibTeX]

EzMol is a software wizard on top of 3Dmol.js incorporating jQuery UI and Spectrum‑Master. EzMol is funded by Imperial College London and the BBSRC.