Genetics: Glossary of terms
Authors:Benjamin A Raby, MD, MPHRobert D Blank, MD, PhDSection Editor:Anne Slavotinek, MBBS, PhDDeputy Editor:Jennifer S Tirnauer, MD
Contributor Disclosures
.
INTRODUCTION — One of the greatest obstacles clinicians experience in reading about and understanding genetics is the extensive use of technical language and jargon. It should be noted that genetic terms are frequently used imprecisely in published clinical literature. The following is a compilation of some of the most important technical terms.
A more extensive discussion of terms can be accessed in standard genetics reference texts [1]. In addition, a guide for the conventions regarding the proper names of genes and alleles in humans can be found at www.genenames.org/guidelines.html.
Glossaries of epidemiological terms and terms that apply to systematic reviews and meta-analyses are presented separately in UpToDate. (See "Glossary of common biostatistical and epidemiological terms" and "Systematic review and meta-analysis", section on 'Glossary of terms'.)
LIST OF TERMS
Allele — An allele is one of a series of alternative forms (genotypes) at locus, or a specific region of a chromosome. At the DNA level, different alleles have different base sequences.
Allelic fraction — The allelic fraction can be defined as the number of times a mutated base is observed, divided by the total number of times any base is observed at the locus [2]. Allelic fraction is generally applied to a single mutation in a tumor and thus is distinct from allelic frequency, which examines the frequency of an allele in a population (see 'Allele frequency' below). Mutation fraction can be defined as the ratio between mutant and wild-type alleles in a tumor sample.
Allele frequency — The proportion of chromosomes in a population harboring a specific allele. "Minor allele frequency" typically refers to the less common variant at a biallelic locus and is usually used to refer to the frequency of a single nucleotide polymorphism (SNP). This population frequency is distinguished from allelic ratio, which applies to a single person (eg, with a malignancy).
Allelic heterogeneity — Allelic heterogeneity refers to the common occurrence of multiple pathogenic variants in one gene that all result in the same disease or syndrome. As an example, more than 1500 variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene cause cystic fibrosis. Note that this term differs from genetic heterogeneity, in which variants in multiple genes can cause the same disease phenotype. (See 'Genetic heterogeneity' below.)
Allelic ratio — Allelic ratio measures the relative abundance of mutated to normal or wildtype alleles within a tumor. Higher allelic ratios (ie, a greater fraction of mutant alleles) have been reported to be associated with poorer prognosis. Unlike allele frequency, which is a characteristic of a population (see 'Allele frequency' above), allelic ratio is a property of cells within a tumor in a single individual. Allelic ratio is of necessity an inexact concept because it is rare (for solid tumors at least) to avoid substantial contamination by non-tumor cells from blood, stroma, and vasculature. Amplification of mutant sequences in a tumor can also have a large impact on allelic ratios.
Aneuploidy — The state of having an abnormal number of chromosomes. A euploid human karyotype has 46 chromosomes (figure 1). Aneuploidy can affect the entire somatic cell population, as in trisomy 21, or it can affect a subset of cells, as in a tumor.
Anticipation — A phenomenon whereby the symptoms of a genetically-based condition appear at an earlier age, or with greater severity, in successive generations. Expansion of trinucleotide repeats is a known molecular cause for specific diseases (such as myotonic dystrophy, fragile X syndrome, Huntington’s chorea) that manifest anticipation.
Association — Genetic association is a property of alleles. It refers to the non-random relationship between an allele and a phenotype in a population. Genetic association between a marker allele and a phenotype can result in either because the allele is a direct causal variant, because the allele is in linkage disequilibrium, or segregating with a causal variant in close proximity, or because of stratification of the population. Association may be determined in a genome-wide association study. (See 'Genome-wide association study (GWAS)' below.)
Autosome — A chromosome other than X or Y. The human genome has 44 autosomes (22 pairs of autosomes) (figure 1).
Autosomal — A gene is autosomal if it is located on an autosome rather than a sex chromosome. A gene's inheritance pattern is also referred to as autosomal if the pattern corresponds to that of known autosomal genes (rather than sex-linked). (See 'Sex-linked' below.)
Benign variant — (See 'Variant' below.)
Biome — Humans are colonized by a multitude of microorganisms, which vary by age and location in the body. The biome (or microbiome) is the totality of colonizing microorganisms in a specific environmental milieu. Biomes may be studied genetically using metagenomics. (See 'Metagenomics' below.)
Carrier — An individual who is heterozygous for a risk or disease allele. The term is typically used to describe someone who is heterozygous for a gene variant that causes autosomal recessive or X-linked recessive disease, but it is also used to describe heterozygotes for risk alleles of complex traits with variable penetrance, regardless of inheritance type.
Carrier rate — The frequency of carriers in a population.
Carrier testing — Clinical method of genotyping at risk populations or family members to identify individuals, usually asymptomatic, who have a pathogenic or likely pathogenic variant for an autosomal recessive or X-linked disorder. One example is prenatal screening for Tay-Sachs disease-associated variants in people of Ashkenazi Jewish ancestry. (See "Genetic testing" and "Carrier screening for genetic disease in the Ashkenazi Jewish population".)
Centromere — A condensed chromosome region that mediates attachment of chromosomes to the microtubules of the mitotic or meiotic spindle. The centromere is important in preserving normal chromosome number.
Chromatid — One of two replications, or copies, of a chromosome formed prior to cell division and joined together at their centromeres. The centromere is the last portion of a chromosome to replicate during cell division. Sister chromatids are a pair of chromatids attached at the centromere.
Chromatin — A complex structure composed of DNA, RNA, and proteins that facilitates efficient packaging of DNA in cells. The primary structure of chromatin is the nucleosome, consisting of double-stranded DNA coiled around a core of histone proteins. Nucleosomes packed tightly together form a "bead-on-string" configuration, which in turn assembles in hierarchical looping structures to create densely-packaged chromatin. The regulation of gene transcription is governed by the uncoiling of packed chromatin (heterochromatin) into exposed DNA (euchromatin).
Clonal — Arising from a single clone or cell. Examples include a clonal selection of lymphocytes during immune development and clonal origin of leukemia cells or other tumor cells. (See "Immunoglobulin genetics" and "Pathogenesis of acute myeloid leukemia".)
Cloning — Production of a genetically identical copy. Can refer to a single gene or to an entire organism.
Coding region — Portion of a gene that encodes a protein.
Coding mutation or polymorphism — A genetic variation in the open reading frame (protein-encoding region) of a gene. Coding variants that alter the amino acid composition of a protein are called non-synonymous or missense variants (figure 2). Variants that do not alter amino acid composition are called synonymous variants. Nonsense variants are coding variants that result in the introduction of a stop codon (figure 3). A frameshift mutation results from an insertion or deletion of a number of bases not divisible by three, resulting in shifting of the reading frame (figure 4). Variants are also classified according to their pathogenicity. (See 'Variant' below.)
Codon — A three-nucleotide sequence that codes either for a specific amino acid or for chain initiation or termination during protein synthesis.
Complementation — The restoration of normal phenotype by gene replacement. The replaced gene can either be an intact copy of a defective gene (direct replacement), or an alternate gene with function that can compensate for the defective gene's aberrant function.
Complex trait/complex disease — Trait or disease for which interactions between more than one gene and/or environmental factors also play a role in the phenotype.
Compound heterozygote — An individual bearing two different pathogenic variants in the same gene that together are sufficient to manifest an autosomal recessive phenotype. This differs from "homozygote," which refers to an individual in whom both pathogenic variants are the same, and from "double heterozygote," which refers to an individual who is heterozygous for pathogenic variants at two separate genetic loci, which together manifest disease.
Consanguinity — Reproduction between two individuals from the same bloodline (eg, first cousin, second cousin). Consanguineous parentage increases the probability of a rare recessive disease, resulting from higher probability of both parents sharing the same rare deleterious sequence variant.
Copy number variation (CNV) — The most prevalent type of chromosomal structural variation, in which the number of copies of a large chromosomal or DNA segment (usually measuring thousands to millions of bases) varies between individuals. (See "Genomic disorders: An overview", section on 'Copy number variations'.)
Crossing-over — The exchange of chromosome segments through the process of recombination that occurs between two homologous chromosomes during meiosis. The site on the chromosome where the exchange occurs is called a crossover.
Coupling — The presence of two specified alleles at two linked loci on the same homologous chromosome (ie, "in cis"), and the two alternative alleles on the other chromosome. For illustration, in the case of dominant and recessive alleles, the coupling gametes formed are AB and ab (figure 5). In contrast, repulsion refers to the presence of the specified alleles at two linked loci on different chromosomes (ie, in trans). (See 'Repulsion' below.)
De novo mutation — A novel genetic sequence variant introduced by a germline mutation in the proband's DNA. Often used to distinguish familial from sporadic cases of genetic disease.
Digenic inheritance — Diseases caused by co-inheritance of mutations at two distinct genetic loci (ie, in two different genes).
Diploid — Possessing two copies of each autosomal chromosome and two sex chromosomes. Most human cells are diploid. Hepatocytes are frequently polyploid (tetraploid or greater). Gametes are haploid (one copy of each autosome and one sex chromosome) (figure 1). (See 'Haploid' below and 'Ploidy' below.)
DNA barcoding — A collection of methods developed to facilitate the analysis of complex mixtures of pooled samples, whereby short, unique DNA sequences (referred to as tags or barcodes) are added to each of the pooled DNA samples (eg, from distinct individuals). Barcoding is used routinely in next-generation sequencing applications, including single-cell RNA sequencing and exome sequencing.
Barcoding also refers to methods for determining the species of origin of a DNA sample on the basis of the DNA sequence itself. A clinical example is the identification of the ingredients in an herbal preparation.
Dominant negative — Dominant negative alleles are alleles that cause an abnormal phenotype or disease by a mechanism that depends on the presence of an abnormal gene product interfering with the function of the products from a normal gene. In other words, the variant allele confers a loss of function by interfering with the remaining normal allele. In contrast to most loss-of-function variants that confer phenotype only when both alleles are defective (ie, recessive inheritance), dominant-negative mutations act dominantly, meaning that only a single allele with the mutation is sufficient to cause the disease phenotype.
Double heterozygote — An individual who is heterozygous for two mutations at two separate genetic loci that together are sufficient to manifest a phenotype. Differs from compound heterozygote.
Embryonic stem cell — A pluripotential cell derived from the inner cell mass of an early-stage embryo that is capable of differentiating into cells derived from all three germ layers.
Epigenetic change — A modification of a chromosome that does not alter the nucleotide base sequence, but alters the expression of a gene. Epigenetic changes may be stable in an individual, but may be reversed during gametogenesis or development. DNA methylation and histone acetylation are common epigenetic changes. Epigenetic changes form the mechanistic basis of imprinting. Some medications alter epigenetic regulation (eg, histone deacetylase [HDAC] inhibitors). Epigenetic modifications are removed when cells are treated in the laboratory to generate induced pluripotent stem cells (iPSCs). (See 'Induced pluripotent stem cell (iPSC)' below.)
Epistasis — The process by which variations at two or more genetic loci interact to produce phenotypes different from the individual effects of each variant. This process is often referred to as either a gene-gene interaction or a genetic modifier effect.
Exome — The portion of the genome that consists of exons. (See 'Exon' below.)
Exome sequencing — A sequencing strategy that provides the DNA sequence corresponding to all exons (which represent approximately 1 to 2 percent of the genome), excluding introns and non-coding genomic sequence. Though the complete exome includes non-coding 5’ and 3’ untranslated regions (UTRs), most exome sequencing assays are enriched for the coding exons and largely exclude the non-coding regions.
Exon — A segment of DNA that is transcribed and present in mature messenger RNA (mRNA). Many exons encode a portion of a protein, but non-coding exons also exist. This is in contrast to an intron, the DNA sequence between exons that does not become part of mature mRNA. Exons constitute only a small percent of the genome (about 1 to 2 percent).
Expressivity — A parameter used in genetic models that quantifies the degree to which an inherited characteristic is expressed in an organism.
Frameshift mutation — A frameshift mutation is a change in DNA sequence that results from an insertion or deletion of a number of bases that is not divisible by three, resulting in a shift of the reading frame (figure 4) and thus altering synthesis of the protein.
Fusion gene — A fusion gene is a functional gene product that results from the fusion of DNA segments from two physically distinct genes. The fusion occurs as a consequence of chromosomal rearrangements such as translocations, inversions, segmental deletions, or duplications. Examples include the BCR-ABL and the FIP1L1-PDGFRA oncogenes.
Gene — A gene is a unit of DNA sequence that encodes specific function. Classical definitions limit genes to those elements that code for proteins. However, non-protein coding genes (such as non-coding RNAs or pseudogenes) are also genes.
Gene editing — Gene editing refers to the use of nucleases to alter the DNA sequence of a gene, as discussed in more detail below. (See 'Genome editing' below.)
Genetic heterogeneity — Genetic heterogeneity refers to a phenomenon in which variants in different genes result in the same phenotype or disease. Examples include the multiple genetic causes of sensorineural deafness. This differs from allelic heterogeneity, in which multiple variants in the same gene can lead to the same phenotype. (See 'Allelic heterogeneity' above.)
Genetic polymorphism — A genetic polymorphism is a DNA segment for which two or more alternate forms can be found in a population. The common types of polymorphisms include single nucleotide variants (single base pair changes, also called single nucleotide polymorphisms [SNPs]), indels (insertion/deletion polymorphisms) or larger structural changes like copy number variants. Most commonly, genetic polymorphism refers a common single base-pair change or single nucleotide polymorphism (SNP). (See 'Polymorphism' below and 'Single nucleotide polymorphism (SNP)' below.)
Genotype — A genotype is the combination of two alleles at one genomic location (locus) or base pair in an individual (figure 5).
Genome editing — Genome editing refers to the use of nucleases to insert or remove DNA from a genome. There are several common technologies that make use of genome editing, including clustered regularly interspaced short palindromic repeats (CRISPR), transcription activator-like effector nucleases (TALENs) and zinc finger nucleases (ZFNs). CRISPR is increasingly employed and is an RNA-guided gene editing method that uses a bacterially-derived protein (Cas9) and a specifically-designed synthetic guide RNA (gRNA; also known as a small guide RNA [sgRNA] or a single guide RNA) to introduce a double-strand break at a precise location in the gene of interest. The sgRNA directs the position of the double-strand break by hybridization to its matching sequence. Genome editing is used as a tool for genetic perturbation in research. Therapeutic applications for the correction of inherited genetic variation are under investigation.
Genome-wide association study (GWAS) — A GWAS (pronounced "gee-wass") study is a type of genetic mapping study design that assesses for evidence of association between genetic variants and heritable traits across the entire genome. Typical studies consist of genotyping hundreds of thousands of common SNPs, using DNA microarrays or other methodologies in large case-control populations, with the goal of identifying specific risk alleles that are more prevalent in cases than in controls. (See "Tools for genetics and genomics: Gene expression profiling".)
Germline — Germline refers to the gametes (ova and spermatozoa and their precursors) that have the capacity to give rise to offspring.
Haploid — Cells or organisms possessing one copy of each autosomal chromosome and one sex chromosome (and therefore effectively one copy of each gene). Gametes (ova and sperm) are haploid. Fertilization of a haploid ovum by a haploid sperm results in formation of a diploid embryo. Many microorganisms are haploid. In contrast, diploid organisms possess two of each autosome and two sex chromosomes. (See 'Diploid' above.)
Haploinsufficiency — Having only a single functional copy of a gene due to inactivation of the second allele by a deleterious variant. In a diploid cell, the single functional copy of the gene does not produce sufficient protein, resulting in disease. All haploinsufficient loci are hemizygous, but not all hemizygous loci are haploinsufficient. (See 'Hemizygous' below.)
Haplotype — The physical combination or sequence of alleles present on a single chromosome. By definition, alleles on one haplotype are in "cis" (figure 5).
Hemizygous — The state of carrying only one copy of a genomic region due to deletion or altered function of the corresponding region on the other chromosome. Carriers of large-scale deletions are hemizygotes. Hemizygosity can confer disease if having one normally functioning copy is insufficient for normal cellular function (haploinsufficiency), but if a single functional copy of the gene is sufficient for normal cellular function, the phenotype may not be abnormal. Hemizygosity can also confer disease if a pathogenic mutation is present within the hemizygous region. (See 'Haploinsufficiency' above.)
Heritability — The proportion of phenotypic variation that is explained by genetic (or in some cases, epigenetic) factors.
Heteroplasmy — The occurrence in a single cell of more than one different population of mitochondrial DNA sequence.
Identity by descent — Alleles are identical by descent if they can be traced back to a common ancestor. Identity by descent is a more stringent classification than identity by state (see 'Identity by state' below). Identity by descent is the basis for establishing linkage.
Identity by state — Alleles are identical by state if the assay being used to distinguish alleles determines that they are identical.
Imprinting — Gamete-specific gene silencing, in which only the allele from the mother or only the allele from the father is expressed, leading to observed parent-of-origin effects in offspring. Examples include the Prader-Willi syndrome and Angelman Syndrome locus and a gene involved in pseudohyperparathyroidism. (See "Epidemiology and genetics of Prader-Willi syndrome" and "Congenital cytogenetic abnormalities".)
Indel — A class of common polymorphism or deleterious sequence variant defined by an extra copy or a missing copy of a short genetic or chromosomal sequence. (See "Chromosomal translocations, deletions, and inversions".)
Induced pluripotent stem cell (iPSC) — A pluripotent cell derived by in vitro reprogramming of a somatic cell that is capable of both self-renewal and differentiation to mature lineages. (See "Overview of stem cells", section on 'Induced pluripotent stem (iPS) cells'.)
Intron — A segment of DNA between two exons that is transcribed to pre-mRNA, but is removed through the process of splicing and is therefore not part of mature mRNA. Introns may contain regulatory DNA or serve other functions.
Inversion — A chromosomal rearrangement characterized by rotation and reintegration of a DNA segment, resulting in an inverted orientation of the segment relative to its typical state.
Karyotype — Karyotype refers to the complete set of chromosomes in an organism or tumor. Karyotype is determined by visual examination and counting of condensed chromosomes from several representative cells to determine the number of copies of each chromosome as well as any translocations. Determination of the karyotype of a tumor is also called "cytogenetic analysis." (See "Tools for genetics and genomics: Cytogenetics and molecular genetics".)
Likely benign variant — (See 'Variant' below.)
Likely pathogenic variant — (See 'Variant' below.)
Linkage — The relationship that exists between two loci that violate the Mendelian law of independent assortment and therefore segregate in families in a non-random fashion. Non-independent assortment results because linked loci reside together on the same chromosome (ie, they are syntenic). However, most syntenic loci are not linked due to mandatory recombination during meiosis. Linkage therefore implies the linked loci are in close physical proximity to each other. The genetic linkage distance is expressed as the recombination fraction, which is measured in centiMorgans (cM). Note that this is not necessarily proportional to the physical distance (base pairs) separating the loci.
Linkage analysis — Method of gene mapping that tests for the non-random segregation of disease phenotypes with discrete chromosomal segments. Identification of linked regions implies the existence of disease-causing (pathogenic) variants within or proximal to the linked region. The process of disease-gene identification within this region is termed positional cloning.
Linkage disequilibrium — The non-random association of alleles at two or more loci in a population. Linkage disequilibrium is present when the observed haplotype distribution of two or more markers in a population is significantly different from the expected haplotype distribution (which can be derived from the cross-product of observed allele frequencies) (figure 6).
Locus — A locus (plural = loci) is a specific chromosomal or genomic location.
LOD score — The "logarithm of the odds" (LOD) score is a quantitative measure of the statistical evidence of linkage between two genes. The LOD score depends on both the probability of cosegregation of the two genes during meiosis and the size and structure of the population in which the linkage analysis is performed. By convention, LOD scores >3 are considered to be evidence of linkage in human studies. In some studies, the threshold LOD scores for linkage can be established via permutation testing.
Lyonization — (See 'X-inactivation' below.)
Manhattan plot — A type of plot used to display results of a GWAS study (see 'Genome-wide association study (GWAS)' above). Genomic coordinates are shown on the X-axis and the negative logarithm of the P-value for each SNP on the Y-axis. SNPs with the strongest association will have the lowest P-values, and hence the tallest profiles. Named for the appearance of the skyline in Manhattan in the United States (figure 7).
Marker — A locus with alternative alleles that can be used in genetic mapping experiments.
Meiosis — The cell division process in germline cells by which the chromosomal complement is reduced from the diploid to the haploid number (figure 8).
Mendelian inheritance — A trait is said to have Mendelian inheritance if its genetic transmission can be explained by a Mendelian model of inheritance, such as autosomal dominant, autosomal recessive, or X-linked recessive or dominant inheritance. This is in contrast to non-Mendelian inheritance patterns such as digenic inheritance, or quantitative traits. (See 'Digenic inheritance' above.)
Metagenomics — The study of complex microbial populations (biomes) using genomic approaches. Human tissues such as the skin and gut have multiple heterogeneous populations of microorganisms that differ from each other with respect to phyla composition and abundance in a tissue-specific manner. These abundances can be estimated by sequencing the mixed population of microorganisms, either through targeted sequencing of 16S ribosomes (for bacterial characterization) or whole-genome approaches (for bacteria, viruses, fungi, and other organisms).
Methylation — The addition of methyl groups to cytosine in DNA. Methylation followed by deamination is a major pathway for mutation to thymine. Methylation also correlates with reduced gene transcription and is an important mechanism for gene imprinting and X-inactivation. (See 'Imprinting' above and 'X-inactivation' below.)
Micro-RNA (miR) — A small, non-coding RNA that regulates the stability or translation of a set of mRNAs.
Microsatellite — A tandem array of short sequences of DNA (typically two to four bases). Microsatellites are numerous and widely distributed in the genome. There is often polymorphism in their length, making them useful markers in genetic studies, including genome mapping and family-based linkage analysis. Microsatellites are also known as short tandem repeat markers (STRs) or short tandem repeat polymorphisms (STRPs).
Mitochondrial genome — The genetic material carried within mitochondria, known as mitochondrial DNA (mtDNA). At fertilization, all the mitochondria are derived from the egg, so mitochondrial genes display maternal inheritance.
Mitosis — The process of cell division occurring in somatic cells, in which each daughter cell receives a full chromosome complement.
Monogenic trait/monogenic disease — Trait or disease with inheritance that can be explained by a single gene, in contrast to polygenic and complex diseases. (See 'Polygenic trait/polygenic disease' below.)
Mutation, mutant — An altered version of a gene that affects function. These terms are used in several different senses, depending on context:
●In human genetics, a mutant is a genetic variant of low population frequency, in contrast to a polymorphism (often a single nuclear polymorphism [SNP]) with an allele frequency of 1 percent or greater. Types of gene mutations include nonsense (creates premature stop codon) (figure 3); missense (creates amino acid change) (figure 2); silent (no associated change in protein sequence); and frameshift (shifts the reading frame of the DNA and alters protein translation, resulting in an entirely new protein sequence downstream of the mutation) (figure 4).
●In human disease, mutation implies a change associated with abnormal function (eg, sickle cell mutation of the hemoglobin beta chain). A disease-causing mutation is also called a pathogenic variant. (See 'Variant' below.)
●When used in the context of inheritance, mutation implies a recent sequence change (either germline or somatic), in contrast to inheritance from a carrier parent.
●When used to refer to an organism or population of organisms, a mutant refers to a population that harbors a specific, atypical variant (eg, antibiotic-resistant mutants).
Mutation fraction — Synonymous with allelic fraction or allelic ratio. (See 'Allelic fraction' above and 'Allelic ratio' above.)
Next-generation sequencing — Any of several high-throughput DNA sequencing methods that rely on parallel analysis of multiple DNA fragments (eg, whole genome sequencing, exome sequencing). These methods have resulted in dramatic decreases in the cost and time needed for sequencing projects and are used in some clinical settings. (See "Principles and clinical applications of next-generation DNA sequencing".)
Non-coding variant — Genetic variation that does not map to gene regions that code for protein. These variants can be functional if they reside in and disrupt functional elements, such as non-coding RNA sequences or regulatory sites (eg, promoters, enhancers, suppressors, or splice-sites).
Oncogene — Gene that contributes to the production of cancer. Oncogenes typically act in a dominant manner (ie, an oncogenic mutation at one allele is sufficient to promote tumorigenesis). In contrast, tumor suppressor genes typically act in a recessive manner. (See 'Tumor suppressor gene' below.)
Pathogenic variant – Genetic change associated with disease or strongly suspected of being associated with disease. (See 'Variant' below.)
Pedigree — A diagram or other graphic representation of a family that shows the family relationships, sex of each family member, and presence or absence of one or more diseases in each individual (figure 9).
Penetrance — The probability that an individual harboring a pathogenic variant will develop the associated disease or condition. Incomplete (or variable) penetrance occurs when an individual with a pathogenic variant does not manifest features of the disorder. There are many causes of incomplete penetrance, including absence of environmental or genetic co-factors, epigenetic effects such as imprinting, sex-specific effects, or age-related expression differences.
Phenotype — A characteristic of an organism (as opposed to the organism’s genotype). Phenotypes are sensitive to the assays used to assign or measure them. They may be categorical, such as presence or absence of a disease; or quantitative, such as systolic blood pressure. Further complexities in phenotypic description involve the physiological state of the organism at the time of measurement, age, or use of provocative stimuli. Most phenotypes are variable, and this variability leads to the concepts of penetrance and expressivity. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Penetrance and expressivity'.)
Pleiotropy — The association of variant(s) in a single gene with multiple phenotypic effects, often in different tissues or organs. An example is Marfan syndrome, in which mutations in the fibrillin 1 (FBN1) gene can cause cardiac, ocular, and connective tissue findings.
Ploidy — The number of sets of chromosomes present in an organism or cell. Ploidy varies among different organisms, including those that are always haploid (eg, bacteria), either haploid or diploid (eg, Saccharomyces species [yeast]), consistently diploid (eg, mammals) (see 'Diploid' above), or polyploid (eg, hexaploid wheat). Different tissues in multicellular organisms may have different ploidies (eg, mammalian hepatocytes may be tetraploid). The gametes (ova and sperm) are haploid (See 'Haploid' above.). The designation of ploidy is based on the predominant ploidy of cells in the organism.
Polymorphism — Polymorphism can refer to a genetic polymorphism. (See 'Genetic polymorphism' above.)
It can also refer to any biologic marker (DNA, RNA, or protein) with two or more states. Protein polymorphisms (varying amino acid sequence) can result from DNA polymorphisms or from differential RNA splicing (different isoforms), which in turn can result from sequence variation, epigenetic phenomena, or temporal/spatial/environmental differences.
Polygenic trait/polygenic disease — In contrast to monogenic diseases, polygenic diseases are those for which the inherited trait(s) is explained by more than one gene. (See 'Monogenic trait/monogenic disease' above.)
Polymerase chain reaction (PCR) — A method of specifically amplifying a unique target sequence (DNA or RNA) in the laboratory. PCR uses specific primers and repeated cycles of heating and cooling with a heat-stable DNA polymerase to replicate the template material exponentially. (See "Tools for genetics and genomics: Polymerase chain reaction".)
Quantitative traits and quantitative trait loci (QTL) — "Quantitative" traits are distinguished from discrete traits. The population varies continuously for quantitative traits and falls into obvious phenotypic classes for discrete traits. Quantitative traits are sometimes referred to as "complex" traits, reflecting the fact that multiple genes, the environment, and gene-environment interactions all contribute to an individual's trait value. Many traits are quantitative, and their inheritance is much more challenging to unravel than discrete traits. A quantitative trait locus (QTL) is a genomic region linked or associated with a quantitative trait.
Read depth — In genomic or gene sequencing, the number of independent times each base in a targeted region has been sequenced. Typically expressed as an average X coverage (for example 20X = an average of 20 sequence reads per base). A minimum read depth of 30X is often required for clinical-grade sequencing. (See "Principles and clinical applications of next-generation DNA sequencing".)
Reading frame — The starting point in translating the DNA sequence to protein. Since each codon includes three nucleotides, the reading frame can be initiated at one of three nucleotides. Offsetting the reading frame changes the amino acid composition of the encoded protein.
Recombinant — Recombinant has different meanings in different contexts. For inheritance patterns, recombinant refers to offspring whose genotype and phenotype combinations differ from their parents, implying genetic recombination between the loci under study.
For laboratory techniques, recombinant technologies (also called genetic engineering), are molecular genetic approaches that use the process of homologous recombination to manipulate genotypes for experimental purposes. Examples include transgenic models where specific genetic loci are either knocked-out (removed) or knocked-in (introduced) to enable study of the locus; recombinant inbred mouse strains; recombinant viral transfection for synthesis of protein.
Recombination — The process of exchanging DNA sequence between two homologous chromosome regions. Mandatory recombination occurs at least once per aligned chromosome pair during meiosis. The exchange results in the creation of novel haplotypes that are combinations of the grandparental haplotypes present in a diploid cell. Exchange of unequal sequence content (ie, non-homologous recombination) can introduce DNA gains and losses of thousands or millions of bases. These gains and losses result in structural genetic variation and copy number variants (CNVs). (See 'Copy number variation (CNV)' above.)
Repulsion — The state in which alleles at two distinct loci are on physically opposing chromosomal strands. By definition, these variants are not part of the same haplotype (figure 5). In the example of dominant and recessive alleles, repulsion gametes formed are Ab and aB. The opposite relationship is coupling. (See 'Coupling' above.)
Risk allele — An allele associated with a disease phenotype that typically acts in combination with other genetic or environmental factors. Though a risk allele is often that which is least common (ie, the minor allele), risk alleles associated with some complex traits may be the more common allele.
RNA interference (RNAi) — A ubiquitous intracellular process mediated by small RNA species, whereby specific RNAs are targeted for editing, degradation, or clearance. RNAi has important roles in the regulation of gene expression, developmental processes, cellular defense, and epigenetic effects.
RNAi technology (also called antisense technology) has been used in the laboratory to test the function of a gene by preventing its expression. Its use has been attempted clinically as a means of posttranscriptional gene silencing to reduce the expression of viral or cancer genes, or to lower cholesterol. Early attempts at developing therapeutic applications are ongoing in the fields of hematology, oncology, and neurodegenerative disease. (See "Hemophilia A and B: Routine management including prophylaxis", section on 'Prophylactic therapies under development' and "Treatment of drug-resistant hypercholesterolemia", section on 'Mipomersen'.)
Sequencing — Determination of the nucleotide base sequence of a gene or collection of genes that determines the amino acid sequence of a protein. (See "Principles and clinical applications of next-generation DNA sequencing".)
Sex chromosomes — Refers to the X and Y chromosomes, which are different in females (XX) and males (XY).
Sex-linked — A gene is sex-linked if it is located on a sex chromosome rather than on an autosome. A gene's inheritance pattern is also referred to as sex-linked if the pattern corresponds to that of known sex-linked genes (rather than autosomal genes). (See 'Autosomal' above.)
Silencing — Regulation that prevents the expression of a gene. Mechanisms of silencing include gene methylation (see 'Methylation' above), destruction of messenger RNA, or prevention of protein translation.
Single nucleotide polymorphism (SNP) — A single nucleotide polymorphism (pronounced "snip") is a polymorphism (difference in base pair) that affects a single base pair, with a population frequency of at least 1 percent. Single base pair changes that occur at a lower population frequency are called pathogenic variants or mutations if they cause disease or affect protein function. (See 'Polymorphism' above.)
Somatic — Referring to tissues that are not within the germline. Somatic mutations arise in somatic tissues and are therefore not passed from parent to offspring. Somatic mutations are common in neoplasms.
Structural genetic variation — A term that encompasses a variety of large-scale genomic aberrations, including segmental rearrangements, translocations, or inversions and copy-number variants (CNVs) (see 'Copy number variation (CNV)' above). Large rearrangements or deletions can be visualized through karyotyping. Smaller variants, particularly CNVs, segmental duplications, and interchromosomal interstitial rearrangements, are assessed by array comparative genomic hybridization (array CGH) or SNP arrays.
Syntenic — Describing genetic loci that reside on the same chromosome. As an example, the genes causing Birt-Hogg-Dubé syndrome (Folliculin [FLCN], at chromosome 17p11) and early-onset breast cancer (BRCA1, at chromosome 17q21) are syntenic to each other on chromosome 17. However, because they are far apart from each other, they are not linked. (See 'Linkage' above.)
Telomere — Region at the ends of a chromosome that prevents the loss of genetic material or the accidental fusion of two chromosomes together during cell division. Telomeres of chromosomes in most cells shorten as an individual ages. Telomere length is maintained by the enzyme telomerase. (See 'Telomerase' below.)
Telomerase — Multicomponent enzyme that extends the length of telomeres. Telomerase mutations are seen in some inherited "telomere syndromes." (See "Dyskeratosis congenita and other short telomere syndromes".)
Translocation — A translocation is a structural chromosomal abnormality whereby chromosome segments are exchanged (swapped) between two non-homologous chromosomes. This form of rearrangement can be balanced, when the translocation does not result in any significant loss or gain of genetic material in the resultant gamete or cell; or unbalanced, when there is a gain or loss of genetic material in the resultant gamete or cell. (See "Chromosomal translocations, deletions, and inversions", section on 'Translocations'.)
Tumor suppressor gene — A tumor suppressor gene is a gene that protects against the development or growth of tumors. Tumor suppressor genes typically act in a recessive manner (ie, both normal copies must be lost for a tumor to develop). In contrast, oncogenes typically act in a dominant manner. (See 'Oncogene' above.)
Uniparental disomy — The inheritance of two copies of a chromosome (or part of a chromosome) from one parent, and no copy from the other parent, due either to nondisjunction errors during either the first or second phases of meiosis, or to chromosomal alterations in early fetal development. Nondisjunction during the first phase of meiosis (meiosis I) will result in inheritance of each of the grandparental chromosomes from one parent, termed "heterodisomy." In contrast, nondisjunction during meiosis II results in inheritance of two identical copies of one grandparental chromosome, termed "isodisomy."
Variant — The term variant is used to refer to a specific change in either DNA or protein sequence. The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology have recommended use of a five-tier terminology system for the clinical classification of genetic variants, consisting of the following designations [3]:
●Pathogenic variant – A disease-causing variant, as determined by very strong genetic and experimental evidence, including consistent familial co-segregation with disease and definitive functional studies.
●Likely pathogenic variant – A variant with strong, but not definitive, evidence of pathogenicity based on its similarity to known pathogenic variants, co-segregation with disease in families or populations, and functional evidence.
●Variant of unknown significance – A variant for which the specific criteria for the other four criteria are not met, or when contradictory lines of evidence in support of both benign or pathogenic classifications are present.
●Likely benign variant – A variant with multiple supporting (but not conclusive) lines of evidence to suggest it is not disease-causing.
●Benign variant – A variant with conclusive evidence as not disease-causing, as determined typically (but not only) by a high prevalence of the variant in the general (healthy) population, at a prevalence that exceeds that of the suspected disease.
Additional information about this classification and its application to genomic testing is presented separately. (See "Secondary findings from genetic testing", section on 'Definitions and classification of variants'.)
Variant of unknown significance (VUS) — A classification term used in clinical DNA sequencing reports to signify genetic polymorphisms for which the pathogenicity (likelihood of causing disease) cannot be determined easily. VUS are variants that cannot be readily classified as "pathogenic," "likely pathogenic," "benign, or "likely benign." (See 'Variant' above.)
Whole genome sequencing — A sequencing strategy that provides the DNA sequence for the entire genome, including exons, introns, and other non-coding sequence. In contrast, exome sequencing only determines the sequence of gene-coding regions.
X-inactivation — An epigenetic process that occurs in all female mammalian cells, whereby one of the two X chromosomes are randomly rendered inactive, such that all subsequent gene expression is derived from the other (active) X chromosome. This is sometimes called lyonization, after Mary Lyon, who did important early work on this phenomenon.
Authors:Benjamin A Raby, MD, MPHRobert D Blank, MD, PhDSection Editor:Anne Slavotinek, MBBS, PhDDeputy Editor:Jennifer S Tirnauer, MD
Contributor Disclosures
.
INTRODUCTION — One of the greatest obstacles clinicians experience in reading about and understanding genetics is the extensive use of technical language and jargon. It should be noted that genetic terms are frequently used imprecisely in published clinical literature. The following is a compilation of some of the most important technical terms.
A more extensive discussion of terms can be accessed in standard genetics reference texts [1]. In addition, a guide for the conventions regarding the proper names of genes and alleles in humans can be found at www.genenames.org/guidelines.html.
Glossaries of epidemiological terms and terms that apply to systematic reviews and meta-analyses are presented separately in UpToDate. (See "Glossary of common biostatistical and epidemiological terms" and "Systematic review and meta-analysis", section on 'Glossary of terms'.)
LIST OF TERMS
Allele — An allele is one of a series of alternative forms (genotypes) at locus, or a specific region of a chromosome. At the DNA level, different alleles have different base sequences.
Allelic fraction — The allelic fraction can be defined as the number of times a mutated base is observed, divided by the total number of times any base is observed at the locus [2]. Allelic fraction is generally applied to a single mutation in a tumor and thus is distinct from allelic frequency, which examines the frequency of an allele in a population (see 'Allele frequency' below). Mutation fraction can be defined as the ratio between mutant and wild-type alleles in a tumor sample.
Allele frequency — The proportion of chromosomes in a population harboring a specific allele. "Minor allele frequency" typically refers to the less common variant at a biallelic locus and is usually used to refer to the frequency of a single nucleotide polymorphism (SNP). This population frequency is distinguished from allelic ratio, which applies to a single person (eg, with a malignancy).
Allelic heterogeneity — Allelic heterogeneity refers to the common occurrence of multiple pathogenic variants in one gene that all result in the same disease or syndrome. As an example, more than 1500 variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene cause cystic fibrosis. Note that this term differs from genetic heterogeneity, in which variants in multiple genes can cause the same disease phenotype. (See 'Genetic heterogeneity' below.)
Allelic ratio — Allelic ratio measures the relative abundance of mutated to normal or wildtype alleles within a tumor. Higher allelic ratios (ie, a greater fraction of mutant alleles) have been reported to be associated with poorer prognosis. Unlike allele frequency, which is a characteristic of a population (see 'Allele frequency' above), allelic ratio is a property of cells within a tumor in a single individual. Allelic ratio is of necessity an inexact concept because it is rare (for solid tumors at least) to avoid substantial contamination by non-tumor cells from blood, stroma, and vasculature. Amplification of mutant sequences in a tumor can also have a large impact on allelic ratios.
Aneuploidy — The state of having an abnormal number of chromosomes. A euploid human karyotype has 46 chromosomes (figure 1). Aneuploidy can affect the entire somatic cell population, as in trisomy 21, or it can affect a subset of cells, as in a tumor.
Anticipation — A phenomenon whereby the symptoms of a genetically-based condition appear at an earlier age, or with greater severity, in successive generations. Expansion of trinucleotide repeats is a known molecular cause for specific diseases (such as myotonic dystrophy, fragile X syndrome, Huntington’s chorea) that manifest anticipation.
Association — Genetic association is a property of alleles. It refers to the non-random relationship between an allele and a phenotype in a population. Genetic association between a marker allele and a phenotype can result in either because the allele is a direct causal variant, because the allele is in linkage disequilibrium, or segregating with a causal variant in close proximity, or because of stratification of the population. Association may be determined in a genome-wide association study. (See 'Genome-wide association study (GWAS)' below.)
Autosome — A chromosome other than X or Y. The human genome has 44 autosomes (22 pairs of autosomes) (figure 1).
Autosomal — A gene is autosomal if it is located on an autosome rather than a sex chromosome. A gene's inheritance pattern is also referred to as autosomal if the pattern corresponds to that of known autosomal genes (rather than sex-linked). (See 'Sex-linked' below.)
Benign variant — (See 'Variant' below.)
Biome — Humans are colonized by a multitude of microorganisms, which vary by age and location in the body. The biome (or microbiome) is the totality of colonizing microorganisms in a specific environmental milieu. Biomes may be studied genetically using metagenomics. (See 'Metagenomics' below.)
Carrier — An individual who is heterozygous for a risk or disease allele. The term is typically used to describe someone who is heterozygous for a gene variant that causes autosomal recessive or X-linked recessive disease, but it is also used to describe heterozygotes for risk alleles of complex traits with variable penetrance, regardless of inheritance type.
Carrier rate — The frequency of carriers in a population.
Carrier testing — Clinical method of genotyping at risk populations or family members to identify individuals, usually asymptomatic, who have a pathogenic or likely pathogenic variant for an autosomal recessive or X-linked disorder. One example is prenatal screening for Tay-Sachs disease-associated variants in people of Ashkenazi Jewish ancestry. (See "Genetic testing" and "Carrier screening for genetic disease in the Ashkenazi Jewish population".)
Centromere — A condensed chromosome region that mediates attachment of chromosomes to the microtubules of the mitotic or meiotic spindle. The centromere is important in preserving normal chromosome number.
Chromatid — One of two replications, or copies, of a chromosome formed prior to cell division and joined together at their centromeres. The centromere is the last portion of a chromosome to replicate during cell division. Sister chromatids are a pair of chromatids attached at the centromere.
Chromatin — A complex structure composed of DNA, RNA, and proteins that facilitates efficient packaging of DNA in cells. The primary structure of chromatin is the nucleosome, consisting of double-stranded DNA coiled around a core of histone proteins. Nucleosomes packed tightly together form a "bead-on-string" configuration, which in turn assembles in hierarchical looping structures to create densely-packaged chromatin. The regulation of gene transcription is governed by the uncoiling of packed chromatin (heterochromatin) into exposed DNA (euchromatin).
Clonal — Arising from a single clone or cell. Examples include a clonal selection of lymphocytes during immune development and clonal origin of leukemia cells or other tumor cells. (See "Immunoglobulin genetics" and "Pathogenesis of acute myeloid leukemia".)
Cloning — Production of a genetically identical copy. Can refer to a single gene or to an entire organism.
Coding region — Portion of a gene that encodes a protein.
Coding mutation or polymorphism — A genetic variation in the open reading frame (protein-encoding region) of a gene. Coding variants that alter the amino acid composition of a protein are called non-synonymous or missense variants (figure 2). Variants that do not alter amino acid composition are called synonymous variants. Nonsense variants are coding variants that result in the introduction of a stop codon (figure 3). A frameshift mutation results from an insertion or deletion of a number of bases not divisible by three, resulting in shifting of the reading frame (figure 4). Variants are also classified according to their pathogenicity. (See 'Variant' below.)
Codon — A three-nucleotide sequence that codes either for a specific amino acid or for chain initiation or termination during protein synthesis.
Complementation — The restoration of normal phenotype by gene replacement. The replaced gene can either be an intact copy of a defective gene (direct replacement), or an alternate gene with function that can compensate for the defective gene's aberrant function.
Complex trait/complex disease — Trait or disease for which interactions between more than one gene and/or environmental factors also play a role in the phenotype.
Compound heterozygote — An individual bearing two different pathogenic variants in the same gene that together are sufficient to manifest an autosomal recessive phenotype. This differs from "homozygote," which refers to an individual in whom both pathogenic variants are the same, and from "double heterozygote," which refers to an individual who is heterozygous for pathogenic variants at two separate genetic loci, which together manifest disease.
Consanguinity — Reproduction between two individuals from the same bloodline (eg, first cousin, second cousin). Consanguineous parentage increases the probability of a rare recessive disease, resulting from higher probability of both parents sharing the same rare deleterious sequence variant.
Copy number variation (CNV) — The most prevalent type of chromosomal structural variation, in which the number of copies of a large chromosomal or DNA segment (usually measuring thousands to millions of bases) varies between individuals. (See "Genomic disorders: An overview", section on 'Copy number variations'.)
Crossing-over — The exchange of chromosome segments through the process of recombination that occurs between two homologous chromosomes during meiosis. The site on the chromosome where the exchange occurs is called a crossover.
Coupling — The presence of two specified alleles at two linked loci on the same homologous chromosome (ie, "in cis"), and the two alternative alleles on the other chromosome. For illustration, in the case of dominant and recessive alleles, the coupling gametes formed are AB and ab (figure 5). In contrast, repulsion refers to the presence of the specified alleles at two linked loci on different chromosomes (ie, in trans). (See 'Repulsion' below.)
De novo mutation — A novel genetic sequence variant introduced by a germline mutation in the proband's DNA. Often used to distinguish familial from sporadic cases of genetic disease.
Digenic inheritance — Diseases caused by co-inheritance of mutations at two distinct genetic loci (ie, in two different genes).
Diploid — Possessing two copies of each autosomal chromosome and two sex chromosomes. Most human cells are diploid. Hepatocytes are frequently polyploid (tetraploid or greater). Gametes are haploid (one copy of each autosome and one sex chromosome) (figure 1). (See 'Haploid' below and 'Ploidy' below.)
DNA barcoding — A collection of methods developed to facilitate the analysis of complex mixtures of pooled samples, whereby short, unique DNA sequences (referred to as tags or barcodes) are added to each of the pooled DNA samples (eg, from distinct individuals). Barcoding is used routinely in next-generation sequencing applications, including single-cell RNA sequencing and exome sequencing.
Barcoding also refers to methods for determining the species of origin of a DNA sample on the basis of the DNA sequence itself. A clinical example is the identification of the ingredients in an herbal preparation.
Dominant negative — Dominant negative alleles are alleles that cause an abnormal phenotype or disease by a mechanism that depends on the presence of an abnormal gene product interfering with the function of the products from a normal gene. In other words, the variant allele confers a loss of function by interfering with the remaining normal allele. In contrast to most loss-of-function variants that confer phenotype only when both alleles are defective (ie, recessive inheritance), dominant-negative mutations act dominantly, meaning that only a single allele with the mutation is sufficient to cause the disease phenotype.
Double heterozygote — An individual who is heterozygous for two mutations at two separate genetic loci that together are sufficient to manifest a phenotype. Differs from compound heterozygote.
Embryonic stem cell — A pluripotential cell derived from the inner cell mass of an early-stage embryo that is capable of differentiating into cells derived from all three germ layers.
Epigenetic change — A modification of a chromosome that does not alter the nucleotide base sequence, but alters the expression of a gene. Epigenetic changes may be stable in an individual, but may be reversed during gametogenesis or development. DNA methylation and histone acetylation are common epigenetic changes. Epigenetic changes form the mechanistic basis of imprinting. Some medications alter epigenetic regulation (eg, histone deacetylase [HDAC] inhibitors). Epigenetic modifications are removed when cells are treated in the laboratory to generate induced pluripotent stem cells (iPSCs). (See 'Induced pluripotent stem cell (iPSC)' below.)
Epistasis — The process by which variations at two or more genetic loci interact to produce phenotypes different from the individual effects of each variant. This process is often referred to as either a gene-gene interaction or a genetic modifier effect.
Exome — The portion of the genome that consists of exons. (See 'Exon' below.)
Exome sequencing — A sequencing strategy that provides the DNA sequence corresponding to all exons (which represent approximately 1 to 2 percent of the genome), excluding introns and non-coding genomic sequence. Though the complete exome includes non-coding 5’ and 3’ untranslated regions (UTRs), most exome sequencing assays are enriched for the coding exons and largely exclude the non-coding regions.
Exon — A segment of DNA that is transcribed and present in mature messenger RNA (mRNA). Many exons encode a portion of a protein, but non-coding exons also exist. This is in contrast to an intron, the DNA sequence between exons that does not become part of mature mRNA. Exons constitute only a small percent of the genome (about 1 to 2 percent).
Expressivity — A parameter used in genetic models that quantifies the degree to which an inherited characteristic is expressed in an organism.
Frameshift mutation — A frameshift mutation is a change in DNA sequence that results from an insertion or deletion of a number of bases that is not divisible by three, resulting in a shift of the reading frame (figure 4) and thus altering synthesis of the protein.
Fusion gene — A fusion gene is a functional gene product that results from the fusion of DNA segments from two physically distinct genes. The fusion occurs as a consequence of chromosomal rearrangements such as translocations, inversions, segmental deletions, or duplications. Examples include the BCR-ABL and the FIP1L1-PDGFRA oncogenes.
Gene — A gene is a unit of DNA sequence that encodes specific function. Classical definitions limit genes to those elements that code for proteins. However, non-protein coding genes (such as non-coding RNAs or pseudogenes) are also genes.
Gene editing — Gene editing refers to the use of nucleases to alter the DNA sequence of a gene, as discussed in more detail below. (See 'Genome editing' below.)
Genetic heterogeneity — Genetic heterogeneity refers to a phenomenon in which variants in different genes result in the same phenotype or disease. Examples include the multiple genetic causes of sensorineural deafness. This differs from allelic heterogeneity, in which multiple variants in the same gene can lead to the same phenotype. (See 'Allelic heterogeneity' above.)
Genetic polymorphism — A genetic polymorphism is a DNA segment for which two or more alternate forms can be found in a population. The common types of polymorphisms include single nucleotide variants (single base pair changes, also called single nucleotide polymorphisms [SNPs]), indels (insertion/deletion polymorphisms) or larger structural changes like copy number variants. Most commonly, genetic polymorphism refers a common single base-pair change or single nucleotide polymorphism (SNP). (See 'Polymorphism' below and 'Single nucleotide polymorphism (SNP)' below.)
Genotype — A genotype is the combination of two alleles at one genomic location (locus) or base pair in an individual (figure 5).
Genome editing — Genome editing refers to the use of nucleases to insert or remove DNA from a genome. There are several common technologies that make use of genome editing, including clustered regularly interspaced short palindromic repeats (CRISPR), transcription activator-like effector nucleases (TALENs) and zinc finger nucleases (ZFNs). CRISPR is increasingly employed and is an RNA-guided gene editing method that uses a bacterially-derived protein (Cas9) and a specifically-designed synthetic guide RNA (gRNA; also known as a small guide RNA [sgRNA] or a single guide RNA) to introduce a double-strand break at a precise location in the gene of interest. The sgRNA directs the position of the double-strand break by hybridization to its matching sequence. Genome editing is used as a tool for genetic perturbation in research. Therapeutic applications for the correction of inherited genetic variation are under investigation.
Genome-wide association study (GWAS) — A GWAS (pronounced "gee-wass") study is a type of genetic mapping study design that assesses for evidence of association between genetic variants and heritable traits across the entire genome. Typical studies consist of genotyping hundreds of thousands of common SNPs, using DNA microarrays or other methodologies in large case-control populations, with the goal of identifying specific risk alleles that are more prevalent in cases than in controls. (See "Tools for genetics and genomics: Gene expression profiling".)
Germline — Germline refers to the gametes (ova and spermatozoa and their precursors) that have the capacity to give rise to offspring.
Haploid — Cells or organisms possessing one copy of each autosomal chromosome and one sex chromosome (and therefore effectively one copy of each gene). Gametes (ova and sperm) are haploid. Fertilization of a haploid ovum by a haploid sperm results in formation of a diploid embryo. Many microorganisms are haploid. In contrast, diploid organisms possess two of each autosome and two sex chromosomes. (See 'Diploid' above.)
Haploinsufficiency — Having only a single functional copy of a gene due to inactivation of the second allele by a deleterious variant. In a diploid cell, the single functional copy of the gene does not produce sufficient protein, resulting in disease. All haploinsufficient loci are hemizygous, but not all hemizygous loci are haploinsufficient. (See 'Hemizygous' below.)
Haplotype — The physical combination or sequence of alleles present on a single chromosome. By definition, alleles on one haplotype are in "cis" (figure 5).
Hemizygous — The state of carrying only one copy of a genomic region due to deletion or altered function of the corresponding region on the other chromosome. Carriers of large-scale deletions are hemizygotes. Hemizygosity can confer disease if having one normally functioning copy is insufficient for normal cellular function (haploinsufficiency), but if a single functional copy of the gene is sufficient for normal cellular function, the phenotype may not be abnormal. Hemizygosity can also confer disease if a pathogenic mutation is present within the hemizygous region. (See 'Haploinsufficiency' above.)
Heritability — The proportion of phenotypic variation that is explained by genetic (or in some cases, epigenetic) factors.
Heteroplasmy — The occurrence in a single cell of more than one different population of mitochondrial DNA sequence.
Identity by descent — Alleles are identical by descent if they can be traced back to a common ancestor. Identity by descent is a more stringent classification than identity by state (see 'Identity by state' below). Identity by descent is the basis for establishing linkage.
Identity by state — Alleles are identical by state if the assay being used to distinguish alleles determines that they are identical.
Imprinting — Gamete-specific gene silencing, in which only the allele from the mother or only the allele from the father is expressed, leading to observed parent-of-origin effects in offspring. Examples include the Prader-Willi syndrome and Angelman Syndrome locus and a gene involved in pseudohyperparathyroidism. (See "Epidemiology and genetics of Prader-Willi syndrome" and "Congenital cytogenetic abnormalities".)
Indel — A class of common polymorphism or deleterious sequence variant defined by an extra copy or a missing copy of a short genetic or chromosomal sequence. (See "Chromosomal translocations, deletions, and inversions".)
Induced pluripotent stem cell (iPSC) — A pluripotent cell derived by in vitro reprogramming of a somatic cell that is capable of both self-renewal and differentiation to mature lineages. (See "Overview of stem cells", section on 'Induced pluripotent stem (iPS) cells'.)
Intron — A segment of DNA between two exons that is transcribed to pre-mRNA, but is removed through the process of splicing and is therefore not part of mature mRNA. Introns may contain regulatory DNA or serve other functions.
Inversion — A chromosomal rearrangement characterized by rotation and reintegration of a DNA segment, resulting in an inverted orientation of the segment relative to its typical state.
Karyotype — Karyotype refers to the complete set of chromosomes in an organism or tumor. Karyotype is determined by visual examination and counting of condensed chromosomes from several representative cells to determine the number of copies of each chromosome as well as any translocations. Determination of the karyotype of a tumor is also called "cytogenetic analysis." (See "Tools for genetics and genomics: Cytogenetics and molecular genetics".)
Likely benign variant — (See 'Variant' below.)
Likely pathogenic variant — (See 'Variant' below.)
Linkage — The relationship that exists between two loci that violate the Mendelian law of independent assortment and therefore segregate in families in a non-random fashion. Non-independent assortment results because linked loci reside together on the same chromosome (ie, they are syntenic). However, most syntenic loci are not linked due to mandatory recombination during meiosis. Linkage therefore implies the linked loci are in close physical proximity to each other. The genetic linkage distance is expressed as the recombination fraction, which is measured in centiMorgans (cM). Note that this is not necessarily proportional to the physical distance (base pairs) separating the loci.
Linkage analysis — Method of gene mapping that tests for the non-random segregation of disease phenotypes with discrete chromosomal segments. Identification of linked regions implies the existence of disease-causing (pathogenic) variants within or proximal to the linked region. The process of disease-gene identification within this region is termed positional cloning.
Linkage disequilibrium — The non-random association of alleles at two or more loci in a population. Linkage disequilibrium is present when the observed haplotype distribution of two or more markers in a population is significantly different from the expected haplotype distribution (which can be derived from the cross-product of observed allele frequencies) (figure 6).
Locus — A locus (plural = loci) is a specific chromosomal or genomic location.
LOD score — The "logarithm of the odds" (LOD) score is a quantitative measure of the statistical evidence of linkage between two genes. The LOD score depends on both the probability of cosegregation of the two genes during meiosis and the size and structure of the population in which the linkage analysis is performed. By convention, LOD scores >3 are considered to be evidence of linkage in human studies. In some studies, the threshold LOD scores for linkage can be established via permutation testing.
Lyonization — (See 'X-inactivation' below.)
Manhattan plot — A type of plot used to display results of a GWAS study (see 'Genome-wide association study (GWAS)' above). Genomic coordinates are shown on the X-axis and the negative logarithm of the P-value for each SNP on the Y-axis. SNPs with the strongest association will have the lowest P-values, and hence the tallest profiles. Named for the appearance of the skyline in Manhattan in the United States (figure 7).
Marker — A locus with alternative alleles that can be used in genetic mapping experiments.
Meiosis — The cell division process in germline cells by which the chromosomal complement is reduced from the diploid to the haploid number (figure 8).
Mendelian inheritance — A trait is said to have Mendelian inheritance if its genetic transmission can be explained by a Mendelian model of inheritance, such as autosomal dominant, autosomal recessive, or X-linked recessive or dominant inheritance. This is in contrast to non-Mendelian inheritance patterns such as digenic inheritance, or quantitative traits. (See 'Digenic inheritance' above.)
Metagenomics — The study of complex microbial populations (biomes) using genomic approaches. Human tissues such as the skin and gut have multiple heterogeneous populations of microorganisms that differ from each other with respect to phyla composition and abundance in a tissue-specific manner. These abundances can be estimated by sequencing the mixed population of microorganisms, either through targeted sequencing of 16S ribosomes (for bacterial characterization) or whole-genome approaches (for bacteria, viruses, fungi, and other organisms).
Methylation — The addition of methyl groups to cytosine in DNA. Methylation followed by deamination is a major pathway for mutation to thymine. Methylation also correlates with reduced gene transcription and is an important mechanism for gene imprinting and X-inactivation. (See 'Imprinting' above and 'X-inactivation' below.)
Micro-RNA (miR) — A small, non-coding RNA that regulates the stability or translation of a set of mRNAs.
Microsatellite — A tandem array of short sequences of DNA (typically two to four bases). Microsatellites are numerous and widely distributed in the genome. There is often polymorphism in their length, making them useful markers in genetic studies, including genome mapping and family-based linkage analysis. Microsatellites are also known as short tandem repeat markers (STRs) or short tandem repeat polymorphisms (STRPs).
Mitochondrial genome — The genetic material carried within mitochondria, known as mitochondrial DNA (mtDNA). At fertilization, all the mitochondria are derived from the egg, so mitochondrial genes display maternal inheritance.
Mitosis — The process of cell division occurring in somatic cells, in which each daughter cell receives a full chromosome complement.
Monogenic trait/monogenic disease — Trait or disease with inheritance that can be explained by a single gene, in contrast to polygenic and complex diseases. (See 'Polygenic trait/polygenic disease' below.)
Mutation, mutant — An altered version of a gene that affects function. These terms are used in several different senses, depending on context:
●In human genetics, a mutant is a genetic variant of low population frequency, in contrast to a polymorphism (often a single nuclear polymorphism [SNP]) with an allele frequency of 1 percent or greater. Types of gene mutations include nonsense (creates premature stop codon) (figure 3); missense (creates amino acid change) (figure 2); silent (no associated change in protein sequence); and frameshift (shifts the reading frame of the DNA and alters protein translation, resulting in an entirely new protein sequence downstream of the mutation) (figure 4).
●In human disease, mutation implies a change associated with abnormal function (eg, sickle cell mutation of the hemoglobin beta chain). A disease-causing mutation is also called a pathogenic variant. (See 'Variant' below.)
●When used in the context of inheritance, mutation implies a recent sequence change (either germline or somatic), in contrast to inheritance from a carrier parent.
●When used to refer to an organism or population of organisms, a mutant refers to a population that harbors a specific, atypical variant (eg, antibiotic-resistant mutants).
Mutation fraction — Synonymous with allelic fraction or allelic ratio. (See 'Allelic fraction' above and 'Allelic ratio' above.)
Next-generation sequencing — Any of several high-throughput DNA sequencing methods that rely on parallel analysis of multiple DNA fragments (eg, whole genome sequencing, exome sequencing). These methods have resulted in dramatic decreases in the cost and time needed for sequencing projects and are used in some clinical settings. (See "Principles and clinical applications of next-generation DNA sequencing".)
Non-coding variant — Genetic variation that does not map to gene regions that code for protein. These variants can be functional if they reside in and disrupt functional elements, such as non-coding RNA sequences or regulatory sites (eg, promoters, enhancers, suppressors, or splice-sites).
Oncogene — Gene that contributes to the production of cancer. Oncogenes typically act in a dominant manner (ie, an oncogenic mutation at one allele is sufficient to promote tumorigenesis). In contrast, tumor suppressor genes typically act in a recessive manner. (See 'Tumor suppressor gene' below.)
Pathogenic variant – Genetic change associated with disease or strongly suspected of being associated with disease. (See 'Variant' below.)
Pedigree — A diagram or other graphic representation of a family that shows the family relationships, sex of each family member, and presence or absence of one or more diseases in each individual (figure 9).
Penetrance — The probability that an individual harboring a pathogenic variant will develop the associated disease or condition. Incomplete (or variable) penetrance occurs when an individual with a pathogenic variant does not manifest features of the disorder. There are many causes of incomplete penetrance, including absence of environmental or genetic co-factors, epigenetic effects such as imprinting, sex-specific effects, or age-related expression differences.
Phenotype — A characteristic of an organism (as opposed to the organism’s genotype). Phenotypes are sensitive to the assays used to assign or measure them. They may be categorical, such as presence or absence of a disease; or quantitative, such as systolic blood pressure. Further complexities in phenotypic description involve the physiological state of the organism at the time of measurement, age, or use of provocative stimuli. Most phenotypes are variable, and this variability leads to the concepts of penetrance and expressivity. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Penetrance and expressivity'.)
Pleiotropy — The association of variant(s) in a single gene with multiple phenotypic effects, often in different tissues or organs. An example is Marfan syndrome, in which mutations in the fibrillin 1 (FBN1) gene can cause cardiac, ocular, and connective tissue findings.
Ploidy — The number of sets of chromosomes present in an organism or cell. Ploidy varies among different organisms, including those that are always haploid (eg, bacteria), either haploid or diploid (eg, Saccharomyces species [yeast]), consistently diploid (eg, mammals) (see 'Diploid' above), or polyploid (eg, hexaploid wheat). Different tissues in multicellular organisms may have different ploidies (eg, mammalian hepatocytes may be tetraploid). The gametes (ova and sperm) are haploid (See 'Haploid' above.). The designation of ploidy is based on the predominant ploidy of cells in the organism.
Polymorphism — Polymorphism can refer to a genetic polymorphism. (See 'Genetic polymorphism' above.)
It can also refer to any biologic marker (DNA, RNA, or protein) with two or more states. Protein polymorphisms (varying amino acid sequence) can result from DNA polymorphisms or from differential RNA splicing (different isoforms), which in turn can result from sequence variation, epigenetic phenomena, or temporal/spatial/environmental differences.
Polygenic trait/polygenic disease — In contrast to monogenic diseases, polygenic diseases are those for which the inherited trait(s) is explained by more than one gene. (See 'Monogenic trait/monogenic disease' above.)
Polymerase chain reaction (PCR) — A method of specifically amplifying a unique target sequence (DNA or RNA) in the laboratory. PCR uses specific primers and repeated cycles of heating and cooling with a heat-stable DNA polymerase to replicate the template material exponentially. (See "Tools for genetics and genomics: Polymerase chain reaction".)
Quantitative traits and quantitative trait loci (QTL) — "Quantitative" traits are distinguished from discrete traits. The population varies continuously for quantitative traits and falls into obvious phenotypic classes for discrete traits. Quantitative traits are sometimes referred to as "complex" traits, reflecting the fact that multiple genes, the environment, and gene-environment interactions all contribute to an individual's trait value. Many traits are quantitative, and their inheritance is much more challenging to unravel than discrete traits. A quantitative trait locus (QTL) is a genomic region linked or associated with a quantitative trait.
Read depth — In genomic or gene sequencing, the number of independent times each base in a targeted region has been sequenced. Typically expressed as an average X coverage (for example 20X = an average of 20 sequence reads per base). A minimum read depth of 30X is often required for clinical-grade sequencing. (See "Principles and clinical applications of next-generation DNA sequencing".)
Reading frame — The starting point in translating the DNA sequence to protein. Since each codon includes three nucleotides, the reading frame can be initiated at one of three nucleotides. Offsetting the reading frame changes the amino acid composition of the encoded protein.
Recombinant — Recombinant has different meanings in different contexts. For inheritance patterns, recombinant refers to offspring whose genotype and phenotype combinations differ from their parents, implying genetic recombination between the loci under study.
For laboratory techniques, recombinant technologies (also called genetic engineering), are molecular genetic approaches that use the process of homologous recombination to manipulate genotypes for experimental purposes. Examples include transgenic models where specific genetic loci are either knocked-out (removed) or knocked-in (introduced) to enable study of the locus; recombinant inbred mouse strains; recombinant viral transfection for synthesis of protein.
Recombination — The process of exchanging DNA sequence between two homologous chromosome regions. Mandatory recombination occurs at least once per aligned chromosome pair during meiosis. The exchange results in the creation of novel haplotypes that are combinations of the grandparental haplotypes present in a diploid cell. Exchange of unequal sequence content (ie, non-homologous recombination) can introduce DNA gains and losses of thousands or millions of bases. These gains and losses result in structural genetic variation and copy number variants (CNVs). (See 'Copy number variation (CNV)' above.)
Repulsion — The state in which alleles at two distinct loci are on physically opposing chromosomal strands. By definition, these variants are not part of the same haplotype (figure 5). In the example of dominant and recessive alleles, repulsion gametes formed are Ab and aB. The opposite relationship is coupling. (See 'Coupling' above.)
Risk allele — An allele associated with a disease phenotype that typically acts in combination with other genetic or environmental factors. Though a risk allele is often that which is least common (ie, the minor allele), risk alleles associated with some complex traits may be the more common allele.
RNA interference (RNAi) — A ubiquitous intracellular process mediated by small RNA species, whereby specific RNAs are targeted for editing, degradation, or clearance. RNAi has important roles in the regulation of gene expression, developmental processes, cellular defense, and epigenetic effects.
RNAi technology (also called antisense technology) has been used in the laboratory to test the function of a gene by preventing its expression. Its use has been attempted clinically as a means of posttranscriptional gene silencing to reduce the expression of viral or cancer genes, or to lower cholesterol. Early attempts at developing therapeutic applications are ongoing in the fields of hematology, oncology, and neurodegenerative disease. (See "Hemophilia A and B: Routine management including prophylaxis", section on 'Prophylactic therapies under development' and "Treatment of drug-resistant hypercholesterolemia", section on 'Mipomersen'.)
Sequencing — Determination of the nucleotide base sequence of a gene or collection of genes that determines the amino acid sequence of a protein. (See "Principles and clinical applications of next-generation DNA sequencing".)
Sex chromosomes — Refers to the X and Y chromosomes, which are different in females (XX) and males (XY).
Sex-linked — A gene is sex-linked if it is located on a sex chromosome rather than on an autosome. A gene's inheritance pattern is also referred to as sex-linked if the pattern corresponds to that of known sex-linked genes (rather than autosomal genes). (See 'Autosomal' above.)
Silencing — Regulation that prevents the expression of a gene. Mechanisms of silencing include gene methylation (see 'Methylation' above), destruction of messenger RNA, or prevention of protein translation.
Single nucleotide polymorphism (SNP) — A single nucleotide polymorphism (pronounced "snip") is a polymorphism (difference in base pair) that affects a single base pair, with a population frequency of at least 1 percent. Single base pair changes that occur at a lower population frequency are called pathogenic variants or mutations if they cause disease or affect protein function. (See 'Polymorphism' above.)
Somatic — Referring to tissues that are not within the germline. Somatic mutations arise in somatic tissues and are therefore not passed from parent to offspring. Somatic mutations are common in neoplasms.
Structural genetic variation — A term that encompasses a variety of large-scale genomic aberrations, including segmental rearrangements, translocations, or inversions and copy-number variants (CNVs) (see 'Copy number variation (CNV)' above). Large rearrangements or deletions can be visualized through karyotyping. Smaller variants, particularly CNVs, segmental duplications, and interchromosomal interstitial rearrangements, are assessed by array comparative genomic hybridization (array CGH) or SNP arrays.
Syntenic — Describing genetic loci that reside on the same chromosome. As an example, the genes causing Birt-Hogg-Dubé syndrome (Folliculin [FLCN], at chromosome 17p11) and early-onset breast cancer (BRCA1, at chromosome 17q21) are syntenic to each other on chromosome 17. However, because they are far apart from each other, they are not linked. (See 'Linkage' above.)
Telomere — Region at the ends of a chromosome that prevents the loss of genetic material or the accidental fusion of two chromosomes together during cell division. Telomeres of chromosomes in most cells shorten as an individual ages. Telomere length is maintained by the enzyme telomerase. (See 'Telomerase' below.)
Telomerase — Multicomponent enzyme that extends the length of telomeres. Telomerase mutations are seen in some inherited "telomere syndromes." (See "Dyskeratosis congenita and other short telomere syndromes".)
Translocation — A translocation is a structural chromosomal abnormality whereby chromosome segments are exchanged (swapped) between two non-homologous chromosomes. This form of rearrangement can be balanced, when the translocation does not result in any significant loss or gain of genetic material in the resultant gamete or cell; or unbalanced, when there is a gain or loss of genetic material in the resultant gamete or cell. (See "Chromosomal translocations, deletions, and inversions", section on 'Translocations'.)
Tumor suppressor gene — A tumor suppressor gene is a gene that protects against the development or growth of tumors. Tumor suppressor genes typically act in a recessive manner (ie, both normal copies must be lost for a tumor to develop). In contrast, oncogenes typically act in a dominant manner. (See 'Oncogene' above.)
Uniparental disomy — The inheritance of two copies of a chromosome (or part of a chromosome) from one parent, and no copy from the other parent, due either to nondisjunction errors during either the first or second phases of meiosis, or to chromosomal alterations in early fetal development. Nondisjunction during the first phase of meiosis (meiosis I) will result in inheritance of each of the grandparental chromosomes from one parent, termed "heterodisomy." In contrast, nondisjunction during meiosis II results in inheritance of two identical copies of one grandparental chromosome, termed "isodisomy."
Variant — The term variant is used to refer to a specific change in either DNA or protein sequence. The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology have recommended use of a five-tier terminology system for the clinical classification of genetic variants, consisting of the following designations [3]:
●Pathogenic variant – A disease-causing variant, as determined by very strong genetic and experimental evidence, including consistent familial co-segregation with disease and definitive functional studies.
●Likely pathogenic variant – A variant with strong, but not definitive, evidence of pathogenicity based on its similarity to known pathogenic variants, co-segregation with disease in families or populations, and functional evidence.
●Variant of unknown significance – A variant for which the specific criteria for the other four criteria are not met, or when contradictory lines of evidence in support of both benign or pathogenic classifications are present.
●Likely benign variant – A variant with multiple supporting (but not conclusive) lines of evidence to suggest it is not disease-causing.
●Benign variant – A variant with conclusive evidence as not disease-causing, as determined typically (but not only) by a high prevalence of the variant in the general (healthy) population, at a prevalence that exceeds that of the suspected disease.
Additional information about this classification and its application to genomic testing is presented separately. (See "Secondary findings from genetic testing", section on 'Definitions and classification of variants'.)
Variant of unknown significance (VUS) — A classification term used in clinical DNA sequencing reports to signify genetic polymorphisms for which the pathogenicity (likelihood of causing disease) cannot be determined easily. VUS are variants that cannot be readily classified as "pathogenic," "likely pathogenic," "benign, or "likely benign." (See 'Variant' above.)
Whole genome sequencing — A sequencing strategy that provides the DNA sequence for the entire genome, including exons, introns, and other non-coding sequence. In contrast, exome sequencing only determines the sequence of gene-coding regions.
X-inactivation — An epigenetic process that occurs in all female mammalian cells, whereby one of the two X chromosomes are randomly rendered inactive, such that all subsequent gene expression is derived from the other (active) X chromosome. This is sometimes called lyonization, after Mary Lyon, who did important early work on this phenomenon.
No comments:
Post a Comment