Nhảy tới nội dung

Terminology

This is a list of the genomic terms that some might find it helpful. If you're already familiar with bioinformatics or molecular genes, feel free to skip this one.

ACGT#

A (Adenin) is coupled with T (Thymine)

C (Cytosine) is coupled with G (Guanine)

Genetic code is the term we use for the way that the four bases of DNA--the A, C, G, and Ts--are strung together in a way that the cellular machinery, the ribosome, can read them and turn them into a protein. In the genetic code, each three nucleotides in a row count as a triplet and code for a single amino acid. So each sequence of three codes for an amino acid (Codon). And proteins are made up of sometimes hundreds amino acids, so the code that would make one protein could have hundreds, sometimes even thousands, of triplets contained in it.

DNA#

DNA is the chemical name for the molecule that carries genetic instructions in all living things. The DNA molecule consists of two strands that wind around one another to form a shape known as a double helix. Each strand has a backbone made of alternating sugar (deoxyribose) and phosphate groups. Attached to each sugar is one of four bases--adenine (A), cytosine (C), guanine (G), and thymine (T). The two strands are held together by bonds between the bases; adenine bonds with thymine, and cytosine bonds with guanine. The sequence of the bases along the backbones serves as instructions for assembling protein and RNA molecules.

DNA bases pair up with each other, A with T and C with G, to form units called base pairs.

Gene#

A gene is the basic physical and functional unit of heredity. Genes are made up of DNA. Some genes act as instructions to make molecules called proteins. However, many genes do not code for proteins. In humans, genes vary in size from a few hundred DNA bases to more than 2 million bases. The Human Genome Project estimated that humans have between 20,000 and 25,000 genes.

Every person has two copies of each gene, one inherited from each parent. Most genes are the same in all people, but a small number of genes (less than 1 percent of the total) are slightly different between people. Alleles are forms of the same gene with small differences in their sequence of DNA bases. These small differences contribute to each person’s unique physical features.

Protein#

Proteins are an important class of molecules found in all living cells. A protein is composed of one or more long chains of amino acids, the sequence of which corresponds to the DNA sequence of the gene that encodes it. Proteins play a variety of roles in the cell, including structural (cytoskeleton), mechanical (muscle), biochemical (enzymes), and cell signaling (hormones). Proteins are also an essential part of diet.

RNA#

Ribonucleic acid is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids. Along with lipids, proteins, and carbohydrates, nucleic acids constitute one of the four major macromolecules essential for all known forms of life.

Allele#

An allele is a variant form of a given gene, meaning it is one of two or more versions of a known mutation at the same place on a chromosome. It can also refer to different sequence variations for a several-hundred base-pair or more region of the genome that codes for a protein.

Chromosome#

A chromosome is a DNA molecule with part or all of the genetic material of an organism. Most eukaryotic chromosomes include packaging proteins which, aided by chaperone proteins, bind to and condense the DNA molecule to prevent it from becoming an unmanageable tangle.

A chromosome is the structure housing DNA in a cell. Chromosomes are structurally quite sophisticated, containing elements necessary for processes such as replication and segregation. Each species has a characteristic set of chromosomes with respect to number and organization. For example, humans have 23 pairs of chromosomes--22 pairs of numbered chromosomes called autosomes, 1 through 22, and one pair of sex chromosomes, X and Y. Each parent contributes one chromosome of each pair to an offspring.

Ribosome#

Ribosomes are macromolecular machines, found within all living cells, that perform biological protein synthesis. Ribosomes link amino acids together in the order specified by the codons of messenger RNA molecules to form polypeptide chains.

Genome#

In the fields of molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.

Prokaryote Cell#

A prokaryote is a cellular organism that lacks an envelope-enclosed nucleus.[1] The word prokaryote comes from the Greek πρό (pro, 'before') and κάρυον (karyon, 'nut' or 'kernel').[2][3] In the two-empire system arising from the work of Édouard Chatton, prokaryotes were classified within the empire Prokaryota.[4] But in the three-domain system, based upon molecular analysis, prokaryotes are divided into two domains: Bacteria (formerly Eubacteria) and Archaea (formerly Archaebacteria). Organisms with nuclei are placed in a third domain, Eukaryota.[5] In the study of the origins of life, prokaryotes are thought to have arisen before eukaryotes.

Eukaryote Cell#

Eukaryotic cells are cells that contain a nucleus and organelles, and are enclosed by a plasma membrane. Organisms that have eukaryotic cells include protozoa, fungi, plants and animals. These organisms are grouped into the biological domain Eukaryota.

Variant#

  • Structural Variants

    Really an umbrella term, referring to SNP/SNVs, indels, copy number variations and a number of other variants that change the sequence of base pairs in a genome. These variations, while small compared to a frameshift mutation, are increasingly important in understanding human diseases. In fact, it’s been found that nearly all human tumors have some structural variants (some just a handful, others in the thousands).

  • Single-nucleotide Polymorphisms/Single-nucleotide Variations (SNP/SNVs)

    Known as single-nucleotide polymorphisms (SNPs) in populations and single-nucleotide variations (SNVs) in individuals, these variants are simply exchanges of one nucleotide base pair for another. There are several million SNPs in the average human, and perhaps as many in plants. These have become very important markers for certain diseases, and will no doubt serve as guideposts for the development of personalized treatments. A recent study, in fact, showed that while an individual SNP or two did not appear to correlate with cancers, a group of 77 SNPs did seem to be strongly associated with the development of breast cancer.

  • Indels

    Short for “insertion” and “deletion,” these are added or subtracted base pairs in a segment of DNA. It’s estimated that humans have several million of these. More substantial than SNP/SNVs, indels involve between 1 and 10,000 base pairs. Like SNP/SNVs, they most likely play some role in disease and may play an important role in determining personalized medicine. In the disease cystic fibrosis, for example, indels are responsible for the deletion of a single amino acid that triggers the disease.

  • Copy Number Variations

    This refers to differences in the number of specific genes for a certain trait found in a genome. While the “central dogma” taught us that there were two copies of a gene in every genome. However, recent advances have shown that there may be many copies of a gene, or none. And these variations can lead to disease states. These variations may be the most prevalent of all; their large size has meant that they may involve three times as many base pairs as SNP/SNVs, the next-most prevalent structural variation.

  • Translocations and Inversions

    These are chromosomal rearrangements of genes (or at least segments of DNA), in which the DNA segments are broken off, and either located at some other point on the chromosome (translocation), or reinserted into the chromosomal DNA in “reverse,” 180 degrees from its previous alignment (inversions). Generally, the larger the segment of DNA that is subject to these rearrangements, the more likely it will cause a change in phenotype.

Mutation#

A gene mutation is a permanent alteration in the DNA sequence that makes up a gene, such that the sequence differs from what is found in most people. Mutations range in size; they can affect anywhere from a single DNA building block (base pair) to a large segment of a chromosome that includes multiple genes.

HG Ref#

A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals.

As the cost of DNA sequencing falls, and new full genome sequencing technologies emerge, more genome sequences continue to be generated. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Most individuals with their entire genome sequenced, such as James D. Watson, had their genome assembled in this manner.[2][3] For much of a genome, the reference provides a good approximation of the DNA of any single individual. But in regions with high allelic diversity, such as the major histocompatibility complex in humans and the major urinary proteins of mice, the reference genome may differ significantly from other individuals.[4][5][6] Comparison between the reference (build 36) and Watson's genome revealed 3.3 million single nucleotide polymorphism differences, while about 1.4 percent of his DNA could not be matched to the reference genome at all.[7][2] For regions where there is known to be large scale variation, sets of alternate loci are assembled alongside the reference locus.

Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.

Note: Human Genome currently used for MASH is HG38.

References Genome genetics glossary