Genomics Notes

John M. (Johnny) Adams

JAdams -at-

(949) 922-9786 USA


Last updated June 30, 2014.  




These notes are oriented toward longevity determining genes.

After we identify genes, then what?  Create methods to apply what we learn to solutions!

Contact me with ideas on how we can create aging solutions.


* See useful articles at the end


Partial Hierarchy:


   à CHROMOSOME Thread-like DNA structure, found in nuclei of cells, that carries

          hereditary material (genes) 23 pairs of chromosomes (22 pairs of autosomes

          and one pair of sex chromosomes)

   à  HAPLOTYPE combination of alleles

         à  GENE (ALLELE: alternate form of gene / alternative DNA sequences at the

             same physical gene locus)

            à  DNA SEQUENCE

                à  NUCLEOTIDE -- SNP/BASE PAIR (codon is series of 3 base pairs)



* genomics: applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism)

* proteomics: study of proteins, particularly their structures and functions

* transcriptomics: study of RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA

* metabolomics: chemical processes involving metabolites.

* metabolism: set of life-sustaining chemical transformations within the cells of living organisms. These enzyme-catalyzed reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments. The word metabolism can also refer to all chemical reactions that occur in living organisms, including digestion and the transport of substances into and between different cells, in which case the set of reactions within the cells is called intermediary metabolism or intermediate metabolism. 

Catabolism breaks down organic matter, for example to harvest energy in cellular respiration.

Anabolism uses energy to construct components of cells such as proteins and nucleic acids.

* metabolite: intermediates and products of metabolism. The term metabolite is usually restricted to small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, catalytic activity of their own (usually as a cofactor to an enzyme), defense, and interactions with other organisms (e.g. pigments, odorants, and pheromones).


* Epigenomics: the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome.

(or changes to the genome that do not involve changes in DNA sequence)

The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell.  Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis.


* Epigenome: a) chemical compounds that modify, or mark, the genome in a way that tells it what to do, where to do it and when to do it. The marks, which are not part of the DNA itself, can be passed on from cell to cell as cells divide, and from one generation to the next.

pesticides. As it marks the genome with these chemical tags, the epigenome serves as the intersection between the genome and the environment.

The epigenome marks your genome in two main ways, both of which play a role in turning genes off or on. . . .   methylation, histone modification





* genome: complete set of deoxyribonucleic acid, or DNA, in a cell. DNA carries the instructions for building all of the proteins that make each living creature unique.

* genotype: full complement of genes influencing the phenotype *for a particular trait*.

* chromosome: Thread-like DNA structure, found in nuclei of cells, that carries hereditary material (genes).

* gene: Located on chromosomes.  Unit of heredity. Segment of DNA found at a particular position (locus) on a chromosome, involved in expression of a specific trait.

* base pair: one of TA, AT, CG, GC

* condon: Series of three base pairs, like a word in a sentence

* locus: like "location" Position of a single gene on a chromosome.

* nucleotide: A (adenine), T (thymine), G (guanine), and C (cytosine).  biological molecules  

RNA: uses U (uridine) instead of T (thymine)

  abbrev nt  example gene NADC2 nt 1648

  structure: See

  Purines:  adenine and guanine, Pyrimidines: cytosine, uracil, thymine

* Nucleosome: A nucleosome is the basic unit of DNA packaging in eukaryotes, consisting of a segment of DNA wound in sequence around eight histone protein cores. This structure is often compared to thread wrapped around a spool.

* bp: base pair

* Nucleic acids: large biological molecules essential for all known forms of life. They include DNA (deoxyribonucleic acid) and RNA (ribonucleic acid).

* Association: found to occur together


ε means allele

* allele (not allelle) also designated by ε for example ε3 on APOE:

ALTERNATE FORM OF A GENE possessing a unique nucleotide sequence. Also referred to as an allomorph.

An allele is one of a series of different forms of a gene.

Alternative DNA sequences at the same physical gene locus


* polymorphism: Two or more alleles existing in a population at a particular locus.

Example: Picture of white and black mouse: Genes which control hair colour are polymorphic.


* histone: any of a group of five small basic proteins, occurring in the nucleus of eukaryotic cells, that organize DNA strands into nucleosomes by forming molecular complexes around which the DNA winds.

A chemical code scrawled on histones - the protein husks that coat DNA in every animal or plant cell - determines which genes in that cell are turned on and which are turned off. [Researchers have now] identified characteristic differences in "histone signatures" between stem cells from the muscles of young mice and old mice. The team also distinguished histone-signature differences between quiescent and active stem cells in the muscles of young mice.


* Epigenetics: study of changes in gene expression or cellular phenotype, caused by mechanisms other than changes in the underlying DNA sequence – hence the name epi- (Greek: επί- over, above, outer) -genetics, some of which have been argued to be heritable.  It refers to functionally relevant modifications to the genome that do not involve a change in the nucleotide sequence. Examples of such modifications are DNA methylation and histone modification, both of which serve to regulate gene expression without altering the underlying DNA sequence.


* Haploid

(1) The number of chromosomes in a gamete of an organism, symbolized by n.

(2) A cell or an organism having half of the number of chromosomes in somatic cells.


* Diploid: A cell or an organism consisting of two sets of chromosomes: usually, one set from the mother and another set from the father. In a diploid state the haploid number is doubled, thus, this condition is also known as 2n.


* Haploid are gametes or sex cells, or sperm and ova.

* Diploids are somatic cells, or regular cells, or neurons, muscle cells, bone cells, etc.

* karyotype: the number and appearance of chromosomes in the nucleus of a eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.


* marker: A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation (which may arise due to mutation or alteration in the genomic loci) that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like minisatellites.

in 22506048.pdf D3S3547 is a marker


* peptide:
"protein fragment"
a compound containing two or more amino acids in which the carboxyl group of one acid is linked to the amino group of the other.

("digested", derived from πέσσειν, "to digest") are short chains of amino acid monomers linked by peptide (amide) bonds.


* biological pathway: A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move.


* signaling pathway: common in signal transduction, refers to a group of molecules that work together to control one or more cell functions.

* genomic pathway: an unusual combination of terms?  Already used sometimes in the literature. May refer to some kind of signaling that involves the genome, e.g. transcription and can be affected by single nucleotide variants.

JA -- or effect of multiple genesHow to find the pathways in which a given gene or protein is involved

22113349.pdf says grouped per pathway or gene region, can …


* insertions and deletions: commonly called indels because it's often impossible to tell if a section of DNA has been inserted in one allele or deleted from the other allele. 


* Copy-number variations (CNVs): a form of structural variation—are alterations of the DNA of a genome that results in the cell having an abnormal number of copies of one or more sections of the DNA.


* nuclease: an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids. Older publications may use terms such as "polynucleotidase" or "nucleodepolymerase".


* substitution: In A/G, the A and G are DNA bases adenine and guanine. PPARD A/G means that normal people have an A (at some position in PPARD), but in the case group (study group, long lived people for our purposes) they have a G there instead.


à G1691A means A was substituted for G

20720309.pdf  The aim of our study was to verify or contradict the hypothesis of a favourable association between the A allele (A1691) and longevity in the Polish population.


* [Gene Name] X/Y: Example: PPARD A/G -- In A/G, the A and G are DNA bases adenine and guanine. PPARD A/G means that normal people have an A (at some position in PPARD), but in the case group they have a G there instead.


* SNP, single-nucleotide polymorphism (SNP, pronounced snip; plural snips) is a DNA sequence variation occurring when a single nucleotide — A, T, C or G — in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in a human. For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide.


A SNP can have a name.  Example, for the MEFV gene

M694V = rs28940577

V726A = rs28940578

M694I = rs28940479


* Sickle Cell Anemia

The most common allele of rs334 is (A), encoding the Hb A form of (adult) hemoglobin. rs334(T) encodes the sickling form of hemoglobin, Hb S.


* tag SNP, tagSNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci).


* rsid: Reference SNP cluster ID -- an accession number used by researchers and databases to refer to specific SNPs.


Example re. relationship between SNP and allele from 19837933:

rs10149689 (G allele) and rs12050077 (A allele)


* Nutrigenomics: branch of nutritional genomics and is the study of the effects of foods and food constituents on gene expression.   For example Ordovas found subjects carrying the A allele at the -75 G/A polymorphism show an increase on HDL-C concentrations with increased intakes of PUFA; whereas those homozygotes for the more common G allele have the expected lowering on HDL-C levels as the intake of PUFA goes up.


* Wild type (wt): the phenotype of the typical form of a species as it occurs in nature.

* mutation: a change of the nucleotide sequence of the genome of an organism, virus, or extrachromosomal genetic element. Mutations result from unrepaired damage to DNA or to RNA genomes (typically caused by radiation or chemical mutagens), from errors in the process of replication, or from the insertion or deletion of segments of DNA by mobile genetic elements

- so mutant = one that posesses a mutation(?)


* codon: series of 3 base pairs "nucleotide triplet"


* start codon: the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and a modified Met (fMet) in prokaryotes. The most common start codon is AUG.


* stop codon (or termination codon): a nucleotide triplet within messenger RNA that signals a termination of translation.[1] Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA (from DNA) correspond to the addition of an amino acid to a growing polypeptide chain, which may ultimately become a protein. Stop codons signal the termination of this process by binding release factors, which cause the ribosomal subunits to disassociate, releasing the amino acid chain.


* open reading frame (ORF): the part of a reading frame that contains no stop codons. The transcription termination pause site is located after the ORF, beyond the translation stop codon, because if transcription were to cease before the stop codon, an incomplete protein would be made during translation.[1]

Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop codons.


* Frame does relate to the starting position from which DNA translation proceeds, determining the codons represented. --Tim Hunkapiller


* C segment: C means condensed and reflects heterochromatin regions which reflects

repetitive sequence composition that are seen as bands which are the

mapping coordinates you ask about.  --Tim Hunkapiller


* Heterochromatin: a tightly packed form of DNA, which comes in different varieties. These varieties lie on a continuum between the two extremes of constitutive and facultative heterochromatin. Both play a role in the expression of genes


* autosome: a chromosome that is not an allosome (i.e., not a sex chromosome)

Needs to be studied more to understand


* 22506048 A Genome-Wide Study Replicates Linkage of 3p22-24 to Extreme Longevity in Humans and Identifies Possible Additional Loci

--> 3p22-24

It's Chromosome 3

Every chromosome has 2 arms

  p = shorter

  q = longer

  they are separated from each other only by the centromere

--> So chromosome 3, p arm,

22-24 stands for a specific unit of measure, not particularly accurate.  Gives approx location

Some chromosomes are "acrocentric"(?) and the p is a *lot* shorter.


* banding pattern: We can further divide the chromosomes using special stains that produce stripes known as a banding pattern. Each chromosome has a distinct banding pattern, and each band is numbered to help identify a particular region of a chromosome.


* cytogenetic mapping: This method of mapping a gene to a particular band of the chromosome is called cytogenetic mapping.

For example, the hemoglobin beta gene (HBB) is found on chromosome 11p15.4. This means that the HBB gene lies on the short arm (p) of chromosome 11 and is found at the band labeled 15.4.

So for 3p24.1:

Chromosome 3

p arm

band 24.1


* microsatellites (micro-satellites): Some marker nucleotides are called microsatellites (micro-satellites)


* There are different types of measurements on a chromosome

* one is "megabases" million bases


* cM = centimorgans Stan Primmer 's definition: Difference betw chromosomal locations -- distance.

Wikipedia definition: centimorgan (abbreviated cM): or map unit (m.u.) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions (also termed, loci or markers) for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01. It is often used to infer distance along a chromosome.


* telomere: tiny structures at the ends of your chromosomes that keep them from fraying and losing crucial bits of genetic information. When cells divide, their telomeres get shorter. Once they get too short, cells stops dividing and may die. Played out across the whole body, there's mounting evidence that shorter telomeres translate into increased susceptibility to diseases and the gradual wearing out of tissues that is the hallmark of old age.

It's tempting to think of our telomeres as the cellular equivalents of the Grim Reaper's hourglass, counting out our predetermined lifespans. But the hourglass can get periodic refills - thanks to an enzyme called telomerase, which acts to build telomeres back up. And the rise of telomere testing for consumers is also pegged to evidence that telomere length is not just an inherited inevitability but may be influenced by factors such as stress, exercise and nutrition. The thinking is, if you can regularly monitor your telomere length, you'll be more apt to do the right things to slow the rate at which they're burning away.


* telomerase: builds telomeres back up.



From 11485022.pdf

Several sequence variations in the promoter region of the PAI-1 gene, especially the 4G/5G polymorphism, have been described. 4G/5G polymorphism refers to a guanosine insertion/deletion polymorphism in the pro-moter region of the PAI-1 gene, 675 bp upstream from the start of transcription. In vitro studies suggest that the 4G allele is associated with higher PAI-1 activity than the 5G allele.



* Gel electrophoresis is a method for separation and analysis of macromolecules (DNA, RNA and proteins) and their fragments, based on their size and charge.

It is used in clinical chemistry to separate proteins by charge and/or size (IEF agarose, essentially size independent) and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge.[1]

Need to read more:


Another description from Stan Primmer: The electrophoresis separation of particles of different sizes is pretty simple.  Larger particles have greater resistance to movement through the gel and therefore migrate more slowly than smaller particles.


I presume the charge on particles would affect their migration in a similar manner.  An electrical gradient is established across the gel, and the more highly charged particles would move more rapidly than those with a lower charge.


amplification of DNA via PCR Polymerase chain reaction


may be used as a preparative technique prior to use of other methods such as mass spectrometry, RFLP, PCR, cloning, DNA sequencing, or Southern blotting for further characterization.


* polymerase chain reaction (PCR): a biochemical technology in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence.



SIR2: See 16257164.pdf   Increases lifespan in yeast and/or C. Elegans.  Nematode homologue is sir-2.1.  Low cal diet in yeast upregulates it. 

Mammalian homologues are called “sirtuins” (Sirts). 

SIRT1 is human homologue (16257164.pdf says not associated with human longevity)

SIRT1 is most closely related to yeast SIR2.

Human SIRT3 homologue which encodes a mitochondrial protein (Onyango et al., 2002), has previously been subject to a longevity association study in Italians from Calabria



How do genes get their names?  see 22506048.pdf   Table 4 for example

Ex) NEK10: NIMA (never in mitosis gene a)- related kinase 10

SLC4A7 solute carrier family 4, sodium bicarbonate cotransporter, member 7

EOMES eomesodermin

--> No specific convention.  It seems to be based on an acronym for what it does. 

--> Look at genecard and omin


* ngs: next generation sequencing

* sequence variant (SV): Any change in the gene sequence relative to the human genome reference sequence. Correlagen uses as reference sequence the sequence published by UCSC Genome Bioinformatics in March of 2006 (hg18)


* Homologous chromosomes: (also called homologs or homologues) are chromosome pairs of approximately the same length, centromere position, and staining pattern, with genes for the same characteristics at corresponding loci.  JA-->they're the same


* homozygote: an organism that has the same alleles at a particular gene locus on homologous chromosomes


* Zygosity: the degree of similarity of the alleles for a trait in an organism.


* NECS: New England Centenarian Study – Tom Perls, Boston Medical Center is one of 5 study sites


* LGP: Longevity Genes Project -- Nir Barzilai, Albert Einstein College of Medicine



* Linkage: CLOSE PHYSICAL PROXIMITY of two or more genes on a chromosome which results in them to tend to be inherited together. (see haplotype)


* linkage analysis: A gene-hunting technique that traces patterns of heredity in large, high-risk families, in an attempt to locate a disease-causing gene mutation by identifying traits co-inherited with it; the formal study of the association between the inheritance of a condition in a family and a particular chromosomal locus; LA is based on certain ground rules of genetics. See Lod score


* Linkage disequilibrium (LD): non-random association of alleles at two or more loci, that may or may not be on the same chromosome.

linkage disequilibrium block, LD block

Example: Linkage disequilibrium between the two SNPs was high (r2 = .95) [rsquared = .95], suggesting interaction between them.

also known as gametic phase disequilibrium, or simply gametic disequilibrium.


Disequilibrium (LD) plot: like in 21798861 Figure 1



* sib-pair linkage:

linkage in families/siblings?


The inclusion threshold of 95 years for women and 90 years for men in sibships was selected to balance the gender differential in longevity


* haplotype:

Particular combination of closely linked alleles that tend to be inherited as a unit.

A group of genes that was inherited together from a single parent.

the group of alleles of linked genes, e.g., the HLA complex, contributed by either parent; the haploid genetic constitution contributed by either parent.

A group of alleles

of different genes (as of the major histocompatibility complex)

on a single chromosome

that are closely enough linked to be inherited usually as a unit


Below from :


b. at ADJACENT LOCATIONS (loci) on a chromosome


A haplotype may be one locus, several loci, or an entire chromosome depending on the number of recombination events that have occurred between a given set of loci.

2) A second meaning of the term haplotype is a




Particular combination of closely linked alleles that tend to be inherited as a unit. (see linkage)

Related to haplotype: * When two genes are physically close together (or a variant of the gene) they tend to be inherited together.


* MAF: minor allele frequency.  frequency at which the least common allele occurs in a given population


* haplogroup: group of similar haplotypes that share a common ancestor having the same single nucleotide polymorphism (SNP) mutation in all haplotypes. Because a haplogroup consists of similar haplotypes, it is possible to predict a haplogroup from haplotypes.


* phenotype: observable characteristics, determined by the individual's genotype and its environment.


* allele fixation: An allele becomes fixed in a population when it reaches a frequency of 100%, i.e.,when every individual in the population has only this allele.


* exon: A sequence of DNA that codes information for protein synthesis that is transcribed to messenger RNA

* intron: A segment of a gene situated between exons that is removed before translation of messenger RNA and does not function in coding for protein synthesis.

* promoter: a region of DNA that initiates transcription of a particular gene. Promoters are located near the genes they transcribe, on the same strand and upstream on the DNA (towards the 3' region of the anti-sense strand, also called template strand and non-coding strand). Promoters can be about 100–1000 base pairs long


* primer: a strand of nucleic acid that serves as a starting point for DNA synthesis. It is required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA. The polymerase starts replication at the 3'-end of the primer, and copies the opposite strand.


* five prime / 5 prime untranslated region (5' UTR): can contain elements for controlling gene expression by way of regulatory elements.

  --> It begins at the transcription start site and ends one nucleotide (nt) before the start codon (usually AUG) of the coding region.

In prokaryotes, the 5' UTR usually contains a ribosome binding site (RBS), also known as the Shine Dalgarno sequence (AGGAGGU).


* three prime / 3 prime untranslated region (3'-UTR): the section of messenger RNA (mRNA) that   

   --> immediately follows the translation termination codon.

An mRNA molecule is transcribed from the DNA sequence and is later translated into protein. Several regions of the mRNA molecule are not translated into protein including the 5' cap, 5' untranslated region, 3' untranslated region, and the poly(A) tail. The 3'-UTR often contains regulatory regions that influence post-transcription gene expression.


* uncoupling protein:  a mitochondrial inner membrane protein

important for thermogenesis of this tissue. By causing the membrane to leak protons, it abolishes the proton gradient that drives oxidative phosphorylation, so that electron transport results solely in heat production. JA-->so it gets _______ to create energy.

Read more:


* Indy: short for I'm not dead yet, is a gene of the model organism, the fruit fly Drosophila melanogaster. Mutant versions of this gene doubles the fruit flies' average life span (this is subject to controversy).


* polymerase chain reaction (PCR): form of form of analysis

* restriction fragment length polymorphism (RFLP): a form of analysis


* RNA: uses U (uridine) instead of T (thymine)

* mRNA messenger RNA -- transcribed from DNA, (like a "free floating" strand?)

* tRNA transfer RNA:  an adaptor molecule composed of RNA, typically 73 to 94 nucleotides in length, that serves as the physical link between the nucleotide sequence of nucleic acids (DNA and RNA) and the amino acid sequence of proteins. It does this by carrying an amino acid to the protein synthetic machinery of a cell (ribosome) as directed by a three-nucleotide sequence (codon) in a messenger RNA (mRNA). As such, tRNAs are a necessary component of protein translation, the biological synthesis of new proteins according to the genetic code.

* translation: process through which cellular ribosomes manufacture proteins. I converting into a protein?  Look this up.


* mtDNA: mitochondrial DNA

* Oxidative phosphorylation: OXPHOS is the metabolic pathway in which the mitochondria in cells use their structure, enzymes, and energy released by the oxidation of nutrients to reform ATP.


* heterochromatin: the part of a chromosome that is inactive in gene expression but may function in controlling metabolic activities, transcription, and cell division.   Types: constitutive, facultative

* short tandem repeat (STR): is a type of DNA polymorphism where short sequences of DNA are repeated. STRs are usually considered “junk DNA”


* Variable number of tandem repeats (VNTR): a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length between individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification.



* phenotype - Observable characteristics, determined by the individual's genotype and its environment.

* transgene: a gene that is transferred from an organism of one species to an organism of another species by genetic engineering


* Homozygote: A person who has two identical forms of a particular gene, one inherited from each parent.

For example, a girl who is a homozygote for cystic fibrosis (CF) received the cystic fibrosis gene from both of her parents and therefore has cystic fibrosis.


* Heterozygote: The opposite of a homozygote is a heterozygote, a person who has two different forms of a particular gene. For example, the father and mother of a cystic fibrosis child are heterozygotes for cystic fibrosis (CF). Each carries one normal gene and one cystic fibrosis gene but has no signs or symptoms of the disease.


* copy number: the average number of copies of a transgene in a cell.


* EWAS: epigenome-wide association studies


* P element: a transposon that is present specifically in the fruit fly Drosophila melanogaster and is used widely for mutagenesis and the creation of genetically modified flies used for genetic research. The P element gives rise to a phenotype known as hybrid dysgenesis.



* homologous: the same organ in different animals under every variety of form and function

* ontogenesis: (biology) the process of an individual organism growing organically; a purely biological unfolding of events involved in an organism changing gradually from a simple to a more complex level; "he proposed an indicator of osseous development in children"


* methylation: Addition of a methyl group to a molecule


* methyl group: an alkyl derived from methane, containing one carbon atom bonded to three hydrogen atoms — CH3. The group is often abbreviated Me



* Cytokines are crucial for the regulation of inflammation development in humans. Many studies have shown that variations in cytokine genes might play a role in determining human longevity.


* blot

* antigen: a substance that evokes the production of one or more antibodies.


* antibody (Ab) also known as an immunoglobulin (Ig): a large Y-shaped protein produced by B-cells that is used by the immune system to identify and neutralize foreign objects such as bacteria and viruses.


Sequence-specific oligonucleotide probe (SSOP) method. 


* major histocompatibility complex (MHC): numerous genes with immune-related functions: notably the class I and class II human leukocyte antigens (HLA), tumor necrosis factor A and B, the complement genes, and genes that orchestrate the transport (TAP) and processing (LMP) of antigens for presentation.


* human leukocyte antigen (HLA) system: the name of the major histocompatibility complex (MHC) in humans.


In humans, the MHC complex consists of more than 200 genes located close together on chromosome 6. Genes in this complex are categorized into three basic groups: class I, class II, and class III.

class I: Humans have three main MHC class I genes, known as HLA-A, HLA-B, and HLA-C.


class II: There are six main MHC class II genes in humans: HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA, and HLA-DRB1.


class III: The proteins produced from MHC class III genes have somewhat different functions; they are involved in inflammation and other immune system activities. The functions of some MHC genes are unknown.


Genes are included in the HLA gene family


* Pleiotropy: one gene controls for more than one phenotypic trait


* Antagonistic pleiotropy: evolutionary explanation for senescence. Pleiotropy is the phenomenon where one gene controls for more than one phenotypic trait in an organism.  When one gene controls for more than one trait where at least one of these traits is beneficial to the organism's fitness and at least one is detrimental to the organism's fitness.  If a gene caused both increased reproduction in early life and aging in later life, then senescence would be adaptive in evolution. For example, one study suggests that since follicular depletion in human females causes both more regular cycles in early life and loss of fertility later in life through menopause, it can be selected for by having its early benefits outweigh its late costs.


* Serological typing: Identification of MHC molecules expressed on cells using either naturally occurring antibodies in multiparous women or by alloantiserum raised in animals.

* probe: general term for a piece of dNA or rNA corresponding to a gene or sequence of interest, that has been labelled either radioactively or with some other detectable molecule, such as biotin, digoxygenin or fluorescein. As stretches of DNA or RNA with complementary sequences will (hybridise), a probe will label viral plaques, bacterial colonies or bands on a gel that contain the gene of interest.

* hybridise: To form a double-stranded nucleic acid from two complementary strands of DNA (or RNA).


* telomerase reverse transcriptase : (abbreviated to TERT, or hTERT in humans) is a catalytic subunit of the enzyme telomerase, which together with the telomerase RNA component (TERC), are the most important components of the telomerase comple


* telomerase RNA component (TERC)


* transcriptome: set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells. It differs from the exome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities.

* exome: the part of the genome formed by exons, the sequences which when transcribed remain within the mature RNA after introns are removed by RNA splicing. It differs from a transcriptome in that it consists of all DNA that is transcribed into mature RNA in cells of any type.


* isoform: any of several different forms of the same protein. Different forms of a protein may be produced from related genes, or may arise from the same gene by alternative splicing. A large number of isoforms are caused by single-nucleotide polymorphisms or SNPs, small genetic differences between alleles of the same gene. These occur at specific individual nucleotide positions within a gene.


* genome-wide association (GWA, GWAS): also known as whole genome association study (WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.

*** See below for discussion of GWAS *1


* Candidate gene study:


* Best definition(?) EPIGENETICS: The development and maintenance of an organism is orchestrated by a set of chemical reactions that switch parts of the genome off and on at strategic times and locations. Epigenetics is the study of these reactions and the factors that influence them.


* epigenetic -

Epigenetics is the study of heritable change other than those encoded in DNA sequence. Cytosine methylation of DNA at CpG dinucleotides is the most well-studied epigenetic phenomenon, . . .


1: of, relating to, or produced by epigenesis

2: relating to, being, or involving a modification in gene expression that is independent of the DNA sequence of a gene <epigenetic carcinogenesis> <epigenetic inheritance>

- or -

Study of changes in gene expression or cellular phenotype, caused by mechanisms other than changes in the underlying DNA sequence.  It refers to functionally relevant modifications to the genome that do not involve a change in the nucleotide sequence. Examples of such modifications are DNA methylation and histone modification, both of which serve to regulate gene expression without altering the underlying DNA sequence.

- or -

a : of, relating to, or produced by the chain of developmental processes in epigenesis that lead from genotype to phenotype after the initial action of the genes

b : relating to, being, or involving changes in gene function that do not involve changes in DNA sequence <epigenetic inheritance>



get genetics textbook

any genes have promoter exons  introns



alt splicing (deals w/ LaminA): read Kancao  -- look up pubmed collins telomeres lamina

long noncoding parts on

transcribed into messenger rna . . . translated into proteins


* helicase: a general term describing enzymes capable of unwinding the DNA double helix beginning at the replication fork.




* axis:

* case group --

     In A/G, the A and G are DNA bases adenine and guanine. PPARD A/G means that normal people have an A (at some position in PPARD), but in the case group they have a G there instead.


* Case/control: allelic or genotypic frequencies are compared between cases (usually centenarians) and controls (younger people from the same population) to see if significant differences exist between the allelic or genotypic pools.


*1 [19–21].

GWAS studies of longevity must overcome several important

challenges. The need to control for a very large number of

comparisons compromises power considerably. The studies cited

above are all quite small by current GWAS standards, and

therefore, have only limited power to identify alleles associated

with longevity. Selection of appropriate controls is a particular

problem for studies of longevity. Controls are typically selected

from the current population according to study design, and are by

definition born one or more generations later than the long-lived

individuals to which they are compared. Under such circumstances,

selection, drift, and migration can affect allele frequencies in

ways that might bias intergenerational GWAS results. Additional

and potentially more substantial biases can result from behavioral

changes across generations and time. Larger and more robust

GWASs will undoubtedly contribute importantly to our understanding

of the genetics of longevity; but for now, other

approaches, including family studies, remain competitive.

Here we report the results of the Fertility, Longevity and Aging

(FLAG) study, a genome



Experimental Method and Statistics

* null hypothesis: a general or default position: that there is no relationship between two measured phenomena

* p-value, p value: the confidence that your null hypothesis is right. 

The SMALLER the P value, the more strongly the test rejects the null hypothesis, that is, the hypothesis being tested.

*** --> YOU WANT A SMALL P VALUE to prove you're right.

One often "rejects the null hypothesis" when the p-value is less than the predetermined significance level which is often 0.05 or 0.01, indicating that the observed result would be highly unlikely under the null hypothesis.

Many common statistical tests, such as chi-squared tests or Student's t-test, produce test statistics which can be interpreted using p-values.


Significance test

* Ho: there will be no difference

* Ha: there will be a defference. Ex) the placebo will reduce the mean chemical brain response


* (number of) degrees of freedom: the number of values in the final calculation of a statistic that are free to vary.

inding the degrees of freedom is a very easy thing. it is usually N-1.


* degree of heritability: h2 = 0.32 (h squared) -- (same as r=.32???)


* Proband, or propositus: a term used most often in medical genetics and other medical fields to denote a particular subject (person or animal) being studied or reported on


logarithm of odds (LOD):

A measure of the likelihood that genes are linked, expressed as the logarithm of the odds that an observed data set from the families is due to linkage at a specific map distance rather than to independent assortment on nonlinked genes.

- or -

Logarithm of the ratio of the probability of obtaining a set of observations, assuming a specified degree of linkage, to the probability of obtaining the same set of observations with independent assortment; used to assess the likelihood of linkage between genes from pedigree data


* Odds ratio:

1) The odds ratio can be defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group.  For a probability P (the likelihood of an event), the odds is defined as P/(1-P)

2) a measure of effect size, describing the strength of association or non-independence between two binary data values. 

JA-->My definition: strength of relationship

3) An odds ratio (OR) is a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.


Number of gene found in sample cases (ex, Centen's):         .2

Number of gene found in controls (gen'l population):   .065

Odds ratio: .2 / .065 = 3.08


HIGH: happens 95 % of the time in one group, 5% in another -- so OR = 95/5 = 19

LOW: 50% in one group, 45% in another -- 50/45 = 1.11


* Longitudinal study - a study of the natural course of life or disorder in which a cohort of subjects is serially observed over a period of time and no assumptions need be made about the stability of the system.


* Hardy Weinberg Equilibrium (HWE, Hardy-Weinberg Equilibrium): one final step in the quality control analysis of markers in GWAS data

Sometimes also used with:

* MQLS: program for case-control association testing with related individals using genotype or allele dosage data


* Principal component analysis (PCA) (Principal component spectral analysis) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.


* Q value: Q value, Q factor, and Q score may refer to: see:




* EP: It looks like “A disease condition selected to evaluate (with respect to a genomic measure)“

22174011.pdf  page 3 Representatively, four highly heterogeneous unadjusted aging-related traits were selected as EPs. They include prevalence of cardiovascular disease (CVD; 1,669 cases) and prevalence of cancer (1,060 cases)


* Epidemiology: the study (or the science of the study) of the patterns, causes, and effects of health and disease conditions in defined populations.


* FASTA: text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.


* Epithelium is one of the four basic types of animal tissue, along with connective tissue, muscle tissue and nervous tissue.

Epithelial tissues line the cavities and surfaces of structures throughout the body, and also form many glands.


* Factor V (pronounced factor five) (FV): a protein of the coagulation system, rarely referred to as proaccelerin or labile factor.


* Eukaryotic cells: those that make up cattails. apple trees, mushrooms. dust mites, halibut and HUMANS — have evolved ways to partition off different functions to various locations in the cell.


* cytokine: Any of numerous hormonelike, low-molecular-weight proteins, secreted by various cell types, which regulate the intensity and duration of immune response and mediate cell-to-cell communication.


* antigen: a substance that evokes the production of one or more antibodies.


* De novo: Anew; afresh; beginning again; from the start.



*** Get more current books on molecular biology and genetics/genomics





Look up genes

Click on MapViewer and Epigenomics

* Use dropdown on upper left for useful information


Look up SNPs

Find gene for a SNP

Google it example rs9330200 went here


Click GeneView

Will say

GeneView via analysis of contig annotation: TUBB4B  tubulin, beta 4B class IVb

So it’s TUBB4B

OR do like this

click GeneView – same           



SNPs -- look up rs number  

For example, say you want to find the rsid for the MTTP Q95H variant.

-        Type "MTTP homo sapiens" in the top right corner of the page.

-        Click on Gene, then Human, then Variation Table.

-        Scroll down to the bottom of the page and click on "Show" next to "ALL" so that all gene variants are displayed.

-        Then, if you know either the genomic position or AA position of your variant, you should be able to locate it in the table. For MTTP Q95H,

-        click on the column 'AA coord' to sort by that value, and scroll down to find the Q/H replacement at AA coord 95. From the table you can see that the corresponding rsid is rs61733139.






Big data:


Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.


Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to:

Maintain and develop its controlled vocabulary of gene and gene product attributes;

Annotate genes and gene products, and assimilate and disseminate annotation data;

Provide tools for easy access to all aspects of the data provided by the project. The GO is part of a larger classification effort, the Open Biomedical Ontologies (OBO).


Gene Ontology for Functional Analysis (GOFFA)

GOFFA is a tool developed for ArrayTrack™ that takes a list of genes and identifies terms in Gene Ontology (GO) associated with those genes. GOFFA provides tools to view/access the following:

GO term hierarchy

Full listing of GO terms annotated with the genes associated with a given term

Fisher's exact test p-value providing the probability of identifying that many genes for a given term by chance alone

Relative enrichment factor (E-value) giving the enrichment of a GO term for genes in the submitted list relative to the frequency of genes assigned to that term from the full set of GOFFA annotated genes for a particular species




Multipoint linked regions

zinc finger

wgs read

shotgun sequencing

rRNA  ribosomal rna

book exploring personal genomics

  "citizen scientists"



furthest upstream SNP for each gene. 20569235.pdf




Full articles are often included here

type in the title to search



Studies showing some complexities:

For some you’ll have to read the paper

Specific time windows for gene's effect.


Multiple gene effects


Recycling genes for very different purposes




Copy number variations



Note: I’m looking for more information on compexities, like overlapping genes, same SNPs code for different things along with what’s in front and what’s behind, useful “junk”, SNPs sometimes changing, or operating in networks, sometimes changing the promoter activity, chemical modifications of DNA acting as an extra layer of information in the genome, or working in other ways we are beginning to understand.


Ways to explore and utilize parts of the genome that were previously thought to be junk

see "mobile elements", "cells produce RNA transcripts from a huge portion of the genome, not just for the protein-coding parts", "Those accidentally transcribed pieces of RNA . . ."



Other interesting abstracts

Italian study: . . . complex puzzle of genetic and environmental factors involved in control of lifespan expectancy in humans. 

plosome article






* Liver RNA from rats exposed to Aflatoxin B1 was analyzed by RNA-Seq or microarray: Reads from the Akr7a3 transcript (7-exon gene) show amount of added information compared to a single microarray probe. [National Institute of Environmental Health Sciences and PLoS One.]


* From David Sinclair’s video

As we get older genes get switched on and off in the wrong way. "Orchestra starts to play willy nilly"

Not due to genes changing -- epigenetic, can be fixed.

Some chemical groups that are stimulating genes to go on that shouldn't be on.

    Sirtuins make proteins that clip off these chemical groups.

    Resveratrol binds to the back of proteins that clip them off, and stimulates them.

   When we don't eat, or do exercise, sirtuins are stimulated -- when eat cheeseburger we shut them down.


post-translational modifications


RNA splicing errors