Alleles are the nitrogenous bases: adenine(A), thymine (T), cytosine (C), and guanine(G). When sequenced together correctly, they create the final product: proteins.

Alleles also determine the visual expression of your genes. For example: curly hair, green eyes, etc. This is known as your phenotype.

What is a compound heterozygote?

If two parents are carriers for some condition (i.e., both have one functional and one broken copy of a given gene), then the child may inherit each parent’s broken copy and thus have no functional copies of the gene. When zooming in to look at the genome at the nucleotide level, we see instances where both parents have one functional and one broken copy of a given gene, but the two broken copies are different, i.e., have SNVs at different positions. For example, say the functional version of a gene is ACGTAC, and any SNV in ACGTAC “breaks” the gene. Perhaps the father has one copy of ACGTAC, but the other copy is AAGTAC, whereas the mother has one copy of ACGTAC, but the other copy is ACGTAA:

father mother



There is a 1 in 4 chance for the child to inherit both mutations, as shown below.




In other words, the child has inherited two different recessive alleles and would be called compound heterozygous. Note that, for two unaffected parents to give birth to an affected compound heterozygous child, both mutant alleles must be recessive. Also note that this mechanism is comparatively simple to track: each parent would have one of the two causal SNVs.

In Massimo’s case, scientists identified a single disease-causing gene called DARS. DARS encodes a protein that is a member of a multienzyme complex that functions in mediating the attachment of amino acids to their cognate transfer RNAs. Each of Massimo’s parents had a disease-causing mutation in the DARS gene, but they had different mutations occurring at different parts of the gene. Specifically, Massimo’s father had a single copy of a G→T mutation at position 1099, and Massimo’s mother had a single copy of a C→T mutation at position 821 of the DARS gene. Disruptions in the DARS-encoded protein lead to alterations to cell shape and defects in the nerve cell clusters.

Massimo’s parents are healthy because each of them has only one broken copy of the DARS gene. Massimo got unlucky by inheriting two broken copies (one from each parent), thus triggering the disease.

De novo mutations

If two parents have functional versions of a given gene, but a de novo mutation occurs in the given gene in the germ cell of one of the parents, then the result could also be detrimental to the child. Say both parents have two copies of the functional version of the gene ACGTAC. However, let’s say that a de novo mutation occurs in the sperm cell, so that the sperm cell has a single copy of AAGTAC, and the egg has a single copy of ACGTAC. The child’s genotype is thus AAGTAC / ACGTAC:

father mother






In the case of diseases caused by de novo mutations, the disease must typically be dominant because, although it is possible, it is extremely rare for a de novo mutation to occur in the same gene in both parents’ germ cells. However, unlike the compound heterozygote example, diseases caused by de novo mutations are much harder to track because it is often difficult to determine whether a given variant is a de novo mutation or simply a sequencing error.

Sam’s disease was caused by a single C→T de novo mutation at position 1824 of the LMNA gene encoding the lamin A protein that provides stability and strength to cells. Surprisingly, this mutation does not cause an amino acid change in the protein! Instead, it alters the way the gene's instructions are used to make a protein (activating a cryptic splice site), resulting in an abnormal version of the lamin A protein called progerin, which is missing 50 amino acids near its end. In contrast to the lamin A protein. Progerin does not properly integrate into a scaffold of proteins in a cell, which leads to the disfigurement of the nucleus. Interestingly, progerin has also been implicated in normal aging as it is produced in healthy people via the sporadic use of the cryptic splice site (albeit in small quantities).

Genotype matrix

Consider the following trio, represented by a 10 x 3 matrix Genotype, where each row i represents some hypothetical SNV position and each column j represents one individual from the trio. Each element Genotype(i,j) represents the genotype of individual j at SNV position i. The genotypes are separated by a “slash” character (e.g. A/C represents having an A on one chromosome and a C on the associated chromosome in the pair).

Imagine that you are given the SNV matrix above that contains 10 SNVs, but you are not told which of the three columns represents the child.

STOP and Think: Which of three columns in the genotype matrix corresponds to the child? Which rows in the genotype matrix correspond to de novo mutations?

Note that, although we know what a given individual’s genotype is at each position (e.g. “Genome 3 has a C and a T at SNV 1, a C and a G at SNV 2, etc.”), we don’t yet know the individuals’ haplotype, the grouping of which specific nucleotides are inherited together on the same chromosome. For example, does Genome 3 have a C at SNV 1 and a C at SNV 2 on one chromosome, and have a T at SNV 1 and a G at SNV 2 on the other chromosome? Or does Genome 3 have a C at SNV 1 and a G at SNV 2 on one chromosome, and have a T at SNV1 and a C at SNV 2 on the other chromosome?