List of probability topics

Overview

This is a list of probability topics, by WikiDoc page.

It overlaps with the (alphabetical) list of statistical topics. There are also the list of probabilists and list of statisticians.

General aspects

Foundations of probability theory

Random variables

Overview

A random variable is an abstraction of the intuitive concept of chance into the theoretical domains of mathematics, forming the foundations of probability theory and mathematical statistics.

Within probability theory, a random variable is defined as a quantity whose values are random and to which a probability distribution is assigned. Formally, a random variable is a measurable function from a sample space to the measurable space of possible values of the variable. The formal definition of random variables places experiments involving real-valued outcomes firmly within the measure-theoretic framework and allows us to construct distribution functions of real-valued random variables.

Examples

A random variable can be used to describe the process of rolling a fair die and the possible outcomes { 1, 2, 3, 4, 5, 6 }. The most obvious representation is to take this set as the sample space, the probability measure to be uniform measure, and the function to be the identity function.

For a coin toss, a suitable space of possible outcomes is Ω = { H, T } (for heads and tails). An example random variable on this space is

<math>X(\omega) = \begin{cases}0,& \omega = \texttt{H},\\1,& \omega = \texttt{T}.\end{cases}</math>

Real-valued random variables

Typically, the measurable space is the measurable space over the real numbers. In this case, let <math>(\Omega, \mathcal{F}, P)</math> be a probability space. Then, the function <math>X: \Omega \rightarrow \mathbb{R}</math> is a real-valued random variable if

<math>\{ \omega : X(\omega) \le r \} \in \mathcal{F} \qquad \forall r \in \mathbb{R}</math>

Distribution functions of random variables

Associating a cumulative distribution function (CDF) with a random variable is a generalization of assigning a value to a variable. If the cdf is a (right continuous) Heaviside step function then the variable takes on the value at the jump with probability 1. In general, the cdf specifies the probability that the variable takes on particular values.

If a random variable <math>X: \Omega \to \mathbb{R}</math> defined on the probability space <math>(\Omega, A, P)</math> is given, we can ask questions like “How likely is it that the value of <math>X</math> is bigger than 2?”. This is the same as the probability of the event <math>\{ s \in\Omega : X(s) > 2 \} </math> which is often written as <math>P(X > 2)</math> for short.

Recording all these probabilities of output ranges of a real-valued random variable X yields the probability distribution of X. The probability distribution “forgets” about the particular probability space used to define X and only records the probabilities of various values of X. Such a probability distribution can always be captured by its cumulative distribution function

<math>F_X(x) = \operatorname{P}(X \le x)</math>

and sometimes also using a probability density function. In measure-theoretic terms, we use the random variable X to “push-forward” the measure P on Ω to a measure dF on R. The underlying probability space Ω is a technical device used to guarantee the existence of random variables, and sometimes to construct them. In practice, one often disposes of the space Ω altogether and just puts a measure on R that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables.

Moments

The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation. For example, it is often enough to know what its “average value” is. This is captured by the mathematical concept of expected value of a random variable, denoted E[X]. In general, E[f(X)] is not equal to f(E[X]). Once the “average value” is known, one could then ask how far from this average value the values of X typically are, a question that is answered by the variance and standard deviation of a random variable.

Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X, find a collection {f_i} of functions such that the expectation values E[f_i(X)] fully characterize the distribution of the random variable X.

Functions of random variables

If we have a random variable X on Ω and a measurable function f: R → R, then Y = f(X) will also be a random variable on Ω, since the composition of measurable functions is also measurable. The same procedure that allowed one to go from a probability space (Ω, P) to (R, dF_X) can be used to obtain the distribution of Y. The cumulative distribution function of Y is

<math>F_Y(y) = \operatorname{P}(f(X) \le y).</math>

Example 1

Let X be a real-valued, continuous random variable and let Y = X². Then,

<math>F_Y(y) = \operatorname{P}(X^2 \le y).</math>

If y < 0, then P(X² ≤ y) = 0, so

<math>F_Y(y) = 0\qquad\hbox{if}\quad y < 0.</math>

If y ≥ 0, then

<math>\operatorname{P}(X^2 \le y) = \operatorname{P}(|X| \le \sqrt{y})

= \operatorname{P}(-\sqrt{y} \le  X \le \sqrt{y}),</math>

so

<math>F_Y(y) = F_X(\sqrt{y}) – F_X(-\sqrt{y})\qquad\hbox{if}\quad y \ge 0.</math>

Example 2

Suppose <math>\scriptstyle X</math> is a random variable with a cumulative distribution

<math> F_{X}(x) = P(X \leq x) = \frac{1}{(1 + e^{-x})^{\theta}}</math>

where <math>\scriptstyle \theta > 0</math> is a fixed parameter. Consider the random variable <math> \scriptstyle Y = \mathrm{log}(1 + e^{-X}).</math> Then,

<math> F_{Y}(y) = P(Y \leq y) = P(\mathrm{log}(1 + e^{-X}) \leq y) = P(X > -\mathrm{log}(e^{y} – 1)).\,</math>

The last expression can be calculated in terms of the cumulative distribution of <math>X,</math> so

<math> F_{Y}(y) = 1 – F_{X}(-\mathrm{log}(e^{y} – 1)) \, </math>

<math> = 1 – \frac{1}{(1 + e^{\mathrm{log}(e^{y} – 1)})^{\theta}} </math>

<math> = 1 – \frac{1}{(1 + e^{y} – 1)^{\theta}} </math>

<math> = 1 – e^{-y \theta}.\, </math>

Equivalence of random variables

There are several different senses in which random variables can be considered to be equivalent. Two random variables can be equal, equal almost surely, equal in mean, or equal in distribution.

In increasing order of strength, the precise definition of these notions of equivalence is given below.

Equality in distribution

Two random variables X and Y are equal in distribution if they have the same distribution functions:

<math>\operatorname{P}(X \le x) = \operatorname{P}(Y \le x)\quad\hbox{for all}\quad x.</math>

Two random variables having equal moment generating functions have the same distribution. This provides, for example, a useful method of checking equality of certain functions of iidrv’s.

To be equal in distribution, random variables need not be defined on the same probability space. The notion of equivalence in distribution is associated to the following notion of distance between probability distributions,

<math>d(X,Y)=\sup_x|\operatorname{P}(X \le x) – \operatorname{P}(Y \le x)|,</math>

which is the basis of the Kolmogorov-Smirnov test.

Equality in mean

Two random variables X and Y are equal in p-th mean if the pth moment of |X − Y| is zero, that is,

<math>\operatorname{E}(|X-Y|^p) = 0.</math>

Equality in pth mean implies equality in qth mean for all q<p. As in the previous case, there is a related distance between the random variables, namely

<math>d_p(X, Y) = \operatorname{E}(|X-Y|^p).</math>

Almost sure equality

Two random variables X and Y are equal almost surely if, and only if, the probability that they are different is zero:

<math>\operatorname{P}(X \neq Y) = 0.</math>

For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is associated to the following distance:

<math>d_\infty(X,Y)=\sup_\omega|X(\omega)-Y(\omega)|,</math>

where ‘sup’ in this case represents the essential supremum in the sense of measure theory.

Equality

Finally, the two random variables X and Y are equal if they are equal as functions on their probability space, that is,

<math>X(\omega)=Y(\omega)\qquad\hbox{for all}\quad\omega</math>

Convergence

Much of mathematical statistics consists in proving convergence results for certain sequences of random variables; see for instance the law of large numbers and the central limit theorem.

There are various senses in which a sequence (X_n) of random variables can converge to a random variable X. These are explained in the article on convergence of random variables.

Literature

Kallenberg, O., Random Measures, 4th edition. Academic Press, New York, London; Akademie-Verlag, Berlin (1986). MR0854102 ISBN 0123949602
Papoulis, Athanasios 1965 Probability, Random Variables, and Stochastic Processes. McGraw-Hill Kogakusha, Tokyo, 9th edition, ISBN 0-07-119981-0.

Conditional probability

Probability distributions

Properties of probability distributions

Applied probability

Empirical findings

Much research involving probability is done under the auspices of applied probability, the application of probability theory to other scientific and engineering domains. However, while such research is motivated (to some degree) by applied problems, it is usually the mathematical aspects of the problems that are of most interest to researchers (as is typical of applied mathematics in general).

Applied probabilists are particularly concerned with the application of stochastic processes, and probability more generally, to the natural, applied and social sciences, including biology, physics (including astronomy), chemistry, computer science and information technology, and economics.

Another area of interest is in engineering, particularly in areas of uncertainty, or risks.

External links

The Applied Probability Trust.

Template:WS

Stochastic processes

Geometric probability

Statistics

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities. Statistics are also used for making informed decisions.

Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics. Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject.

The word statistics is also the plural of statistic (singular), which refers to the result of applying a statistical algorithm to a set of data, as in economic statistics, crime statistics, etc.

History

Etymology

The word statistics ultimately derives from the New Latin term statisticum collegium (“council of state”) and the Italian word statista (“statesman” or “politician“). The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the “science of state” (then called political arithmetic in English). It acquired the meaning of the collection and classification of data generally in the early 19th century. It was introduced into English by Sir John Sinclair.

Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide regular information about the population.

Origins in probability

The mathematical methods of statistics emerged from probability theory, which can be dated to the correspondence of Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli‘s Ars Conjectandi (posthumous, 1713) and Abraham de Moivre‘s Doctrine of Chances (1718) treated the subject as a branch of mathematics.^[1] In the modern era, the work of Kolmogorov has been instrumental in formulating the fundamental model of Probability Theory, which is used throughout statistics.

The theory of errors may be traced back to Roger Cotes‘ Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve. He deduced a formula for the mean of three observations. He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

The method of least squares, which was used to minimize errors in data measurement, was published independently by Adrien-Marie Legendre (1805), Robert Adrain (1808), and Carl Friedrich Gauss (1809). Gauss had used the method in his famous 1801 prediction of the location of the dwarf planet Ceres. Further proofs were given by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F. Donkin (1844, 1856), John Herschel (1850), and Morgan Crofton (1870).

Other contributors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875). Peters’s (1856) formula for <math>r</math>, the probable error of a single observation, is well known.

In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory.

Adolphe Quetelet (1796-1874), another important founder of statistics, introduced the notion of the “average man” (l’homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, or suicide rates.

Statistics today

During the 20th century, the creation of precise instruments for public health concerns (epidemiology, biostatistics, etc.) and economic and social purposes (unemployment rate, econometry, etc.) necessitated substantial advances in statistical practices: the Western welfare states developed after World War I had to possess specific knowledge of the “population”.

Today the use of statistics has broadened far beyond its origins as a service to a state or government. Individuals and organizations use statistics to understand data and make informed decisions throughout the natural and social sciences, medicine, business, and other areas.

Statistics is generally regarded not as a subfield of mathematics but rather as a distinct, albeit allied, field. Many universities maintain separate mathematics and statistics departments. Statistics is also taught in departments as diverse as psychology, education, and public health.

Important contributors to statistics

Template:Col-start | style=”text-align: left; vertical-align: top; ” |

| style=”text-align: left; vertical-align: top; ” |

|}

Conceptual overview

In applying statistics to a scientific, industrial, or societal problem, one begins with a process or population to be studied. This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times; data collected about this kind of “population” constitute what is called a time series. For practical reasons, rather than compiling data about an entire population, one usually studies a chosen subset of the population, called a sample. Data are collected about the sample in an observational or experimental setting. The data are then subjected to statistical analysis, which serves two related purposes: description and inference.

Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation. Graphical summarizations include various kinds of charts and graphs.
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation), or modeling of relationships (regression). Other modeling techniques include ANOVA, time series, and data mining.

The concept of correlation is particularly noteworthy. Statistical analysis of a data set may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected. For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated. However, one cannot immediately infer the existence of a causal relationship between the two variables; see correlation does not imply causation. The correlated phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable. If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major problem lies in determining the extent to which the chosen sample is representative. Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place; see experimental design. The fundamental mathematical concept employed in understanding such randomness is probability. Mathematical statistics (also called statistical theory) is the branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics. The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method. Misuse of statistics can produce subtle but serious errors in description and interpretation — subtle in that even experienced professionals sometimes make such errors, and serious in that they may affect social policy, medical practice and the reliability of structures such as bridges and nuclear power plants. Even when statistics is correctly applied, the results can be difficult to interpret for a non-expert. For example, the statistical significance of a trend in the data — which measures the extent to which the trend could be caused by random variation in the sample — may not agree with one’s intuitive sense of its significance. The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as statistical literacy.

Statistical methods

Experimental and observational studies

A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on response or dependent variables. There are two major types of causal statistical studies, experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types is in how the study is actually conducted. Each can be very effective.

The basic steps for an study are to:

plan the research including determining information sources, research subject selection, and ethical considerations for the proposed research and method,
design the experiment concentrating on the system model and the interaction of independent and dependent variables,
summarize a collection of observations to feature their commonality by suppressing details (descriptive statistics),
reach consensus about what the observations tell us about the world we observe (statistical inference),
document and present the results of the study.

Experimental studies

An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation may have modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead data are gathered and correlations between predictors and the response are investigated.

An example of an experimental study is the famous Hawthorne studies which attempted to test changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in whether increased illumination would increase the productivity of the assembly line workers. The researchers first measured productivity in the plant then modified the illumination in an area of the plant to see if changes in illumination would affect productivity. As it turns out, productivity improved under all the experimental conditions (see Hawthorne effect). However, the study is today heavily criticized for errors in experimental procedures, specifically the lack of a control group and blindedness.

Observational studies

An example of an observational study is a study which explores the correlation between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then perform statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a case-control study, and then look at the number of cases of lung cancer in each group.

Survey development

Several frameworks help design surveys^[2]:

Promis^[3]
- Promis Measure and Development Research^[4]

The COSMIN checklist can assess the quality of surveys.^[5]

Designing surveys requires special statistical approaches.^[6] The process consists of three main phases:

Phase 1: Item development

1. Identification of the domain(s) and item generation
2. Consideration of content validity

Phase 2: Scale development This phase turns individual items into a harmonious and measuring construct.

3. Re-testing questions
4. Sampling and survey administration
5. Item reduction
6. Extraction of latent factors. Rasch IRT can help with this.

Phase 3: Scale evaluation

7. Tests of dimensionality
Rasch IRT can help with this. 8. Tests of reliability
9. Tests of validity

Levels of measurement

There are four types of measurements or measurement scales used in statistics. The four types or levels of measurement (nominal, ordinal, interval, and ratio) have different degrees of usefulness in statistical research. Ratio measurements, where both a zero value and distances between different measurements are defined, provide the greatest flexibility in statistical methods that can be used for analysing the data. Interval measurements have meaningful distances between measurements but no meaningful zero value (such as IQ measurements or temperature measurements in Fahrenheit). Ordinal measurements have imprecise differences between consecutive values but a meaningful order to those values. Nominal measurements have no meaningful rank order among values.

Variables conforming only to nominal or ordinal measurements are sometimes called together categorical variables since they cannot reasonably be numerically measured whereas ratio and interval measurements are grouped together as quantitative or continuous variables due to their numerical nature.

Statistical techniques

Some well known statistical tests and procedures for research observations are:

Specialized disciplines

Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include:

Actuarial science
Applied information economics
Biostatistics
Business statistics
Data mining (applying statistics and pattern recognition to discover knowledge from data)
Economic statistics (Econometrics)
Energy statistics
Engineering statistics
Epidemiology
Geography and Geographic Information Systems, more specifically in Spatial analysis
Demography
Psychological statistics
Quality
Social statistics
Statistical literacy
Statistical surveys
Process analysis and chemometrics (for analysis of data from analytical chemistry and chemical engineering)
Reliability engineering
Image processing
Statistics in various sports, particularly baseball and cricket

Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles it is a key tool, and perhaps the only reliable tool.

Statistical computing

The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused a resurgence of interest in nonlinear models (especially neural networks and decision trees) and the creation of new types, such as generalised linear models and multilevel models.

Increased computing power has also led to the growing popularity of computationally-intensive methods based on resampling, such as permutation tests and the bootstrap, while techniques such as Gibbs sampling have made Bayesian methods more feasible. The computer revolution has implications for the future of statistics, with a new emphasis on “experimental” and “empirical” statistics. A large number of both general and special purpose statistical packages are now available to practitioners.

Misuse

There is a general perception that statistical knowledge is all-too-frequently intentionally misused, by finding ways to interpret the data that are favorable to the presenter. A famous quote, variously attributed, but thought to be from Benjamin Disraeli^[7] is, “There are three types of lies – lies, damn lies, and statistics.” The well-known book How to Lie with Statistics by Darrell Huff discusses many cases of deceptive uses of statistics, focusing on misleading graphs. By choosing (or rejecting, or modifying) a certain sample, results can be manipulated; throwing out outliers is one means of doing so. This may be the result of outright fraud or of subtle and unintentional bias on the part of the researcher. Thus, Harvard President Lawrence Lowell wrote in 1909 that statistics, “like veal pies, are good if you know the person that made them, and are sure of the ingredients.”

As further studies contradict previously announced results, people may become wary of trusting such studies. One might read a study that says (for example) “doing X reduces high blood pressure”, followed by a study that says “doing X does not affect high blood pressure”, followed by a study that says “doing X actually worsens high blood pressure”. Often the studies were conducted on different groups with different protocols, or a small-sample study that promised intriguing results has not held up to further scrutiny in a large-sample study. However, many readers may not have noticed these distinctions, or the media may have oversimplified this vital contextual information, and the public’s distrust of statistics is thereby increased.

However, deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis to be ‘favored’ (the null hypothesis), and can also seem to exaggerate the importance of minor differences in large studies. A difference that is highly statistically significant can still be of no practical significance.

See also criticism of hypothesis testing and controversy over the null hypothesis.

In the fields of psychology and medicine, especially with regard to the approval of new drug treatments by the Food and Drug Administration, criticism of the hypothesis testing approach has increased in recent years. One response has been a greater emphasis on the p-value over simply reporting whether a hypothesis was rejected at the given level of significance <math>\alpha</math>. Here again, however, this summarises the evidence for an effect but not the size of the effect. One increasingly common approach is to report confidence intervals instead, since these indicate both the size of the effect and the uncertainty surrounding it. This aids in interpreting the results, as the confidence interval for a given <math>\alpha</math> simultaneously indicates both statistical significance and effect size.

Note that both the p-value and confidence interval approaches are based on the same fundamental calculations as those entering into the corresponding hypothesis test. The results are stated in a more detailed format, rather than the yes-or-no finality of the hypothesis test, but use the same underlying statistical methodology.

A truly different approach is to use Bayesian methods. This approach has been criticized as well, however. The strong desire to see good drugs approved and harmful or useless drugs restricted remain conflicting tensions (type I and type II errors in the language of hypothesis testing).

In his book Statistics As Principled Argument, Robert P. Abelson articulates the position that statistics serves as a standardized means of settling disputes between scientists who could otherwise each argue the merits of their own positions ad infinitum. From this point of view, statistics is principally a form of rhetoric. This can be taken as a positive or a negative, but as with any means of settling a dispute, statistical methods can succeed only as long as both sides accept the approach and agree on the method to be used.

P-hacking

Repeated checking results for statistical significant during data collection may yield false-postivie results^[8].

Notes

↑ See Ian Hacking‘s The Emergence of Probability for a history of the early development of the very concept of mathematical probability.
↑ Health Measures. Available at https://www.healthmeasures.net/explore-measurement-systems/selecting-among-measurement-systems/compare-measurement-systems
↑ Promis Available at
↑ Promis Measure and Development Research. Available at https://www.healthmeasures.net/explore-measurement-systems/promis/measure-development-research
↑ Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC (May 2010). “The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study”. Qual Life Res. 19 (4): 539–49. doi:10.1007/s11136-010-9606-8. PMC 2852520. PMID 20169472.
↑ Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL (2018). “Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer”. Front Public Health. 6: 149. doi:10.3389/fpubh.2018.00149. PMID 29942800.
↑ cf. “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists” by Joel Best. Professor Best attributes it to Disraeli, rather than Mark Twain or others.
↑ Simmons, Joseph P.; Nelson, Leif D.; Simonsohn, Uri (2011-10-17). “False-Positive Psychology”. Psychological Science. SAGE Publications. 22 (11): 1359–1366. doi:10.1177/0956797611417632. ISSN 0956-7976.

Bibliography

Best, Joel (2001). Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists. University of California Press. ISBN 0-520-21978-3.
Desrosières, Alain (2004). The Politics of Large Numbers: A History of Statistical Reasoning. Trans. Camille Naish. Harvard University Press. ISBN 0-674-68932-1.
Hacking, Ian (1990). The Taming of Chance. Cambridge University Press. ISBN 0-521-38884-8.
Lindley, D.V. (1985). Making Decisions (2nd ed. ed.). John Wiley & Sons. ISBN 0-471-90808-8.CS1 maint: Extra text (link)
Stigler, Stephen M. (1990). The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press/Harvard University Press. ISBN 0-674-40341-X.
Tijms, Henk (2004). Understanding Probability: Chance Rules in Everyday life. Cambridge University Press. ISBN 0-521-83329-9.
Volle, Michel (1984). Le métier de statisticien (2nd ed. ed.). Economica. ISBN 2-7178-0824-8.CS1 maint: Extra text (link)

External links

General sites and organizations

Online courses and textbooks

Other resources

Disputes in statistical analyses a single-page review (PDF).
Materials for the History of Statistics (Univ. of York)
Figures from the History of Probability and Statistics (Univ. of Southampton)
Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton)
Reality Clock – Statistics that reflect the issues facing society and the world today.
Resampling: A Marriage of Computers and Statistics (ERIC Digests)
Resources for Teaching and Learning about Probability and Statistics (ERIC Digests)
Software Reports (by the International Association for Statistical Computing)
Statistics in Sports (Section of the ASA)
The R Project for Statistical Computing (free software for statistical computing)

Template:Mathematics-footer Template:Statistics

af:Statistiek ar:إحصاء an:Estadistica az:Statistika bn:পরিসংখ্যান zh-min-nan:Thóng-kè-ha̍k ba:Статистика bs:Statistika br:Statistikoù bg:Статистика ca:Estadística cs:Statistika cy:Ystadegaeth da:Statistik de:Statistik dv:ތަފާސް ހިސާބު et:Statistika el:Στατιστική eo:Statistiko eu:Estatistika fa:آمار fo:Hagfrøði fy:Statistyk fur:Statistiche ga:Staidreamh gv:Steat-choontey gd:Staitistearachd gl:Estatística ko:통계학 hr:Statistika io:Statistiko id:Statistika ia:Statistica iu:ᑭᓯᑦᓯᓯᖕᖑᕐᓗᒋᑦ ᐹᓯᔅᓱᑎᔅᓴᑦ/kisitsisillgurlugitpasissitissat is:Tölfræði it:Statistica he:סטטיסטיקה jv:Statistika lad:Estadistika lo:ສະຖິຕິສາດ lv:Statistika lb:Statistik lt:Statistika li:Sjtatistiek hu:Statisztika mg:Statistika mr:संख्याशास्त्र ms:Statistik nl:Statistiek no:Statistikk nn:Statistikk sq:Statistika scn:Statìstica simple:Statistics sk:Štatistika sl:Statistika sr:Статистика su:Statistik fi:Tilastotiede sv:Statistik tl:Estadistika ta:புள்ளியியல் th:สถิติศาสตร์ tg:Омор uk:Математична статистика vec:Statìstega fiu-vro:Statistiga war:Estadistika yi:סטאטיסטיק bat-smg:Statėstėka

Template:Jb1 Template:WH Template:WS

Applied statistics

Algorithmics

Physics

Percolation theory

Genetics

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]

Associate Editor-In-Chief: Cafer Zorkun, M.D., Ph.D. [2]

DNA, the molecular basis for inheritance. Each strand of DNA is a chain of nucleotides, matching each other in the center to form what look like rungs on a twisted ladder.

Genetics, a discipline of biology, is the science of heredity and variation in living organisms.^[1]^[2] Knowledge of the inheritance of characteristics has been implicitly used since prehistoric times for improving crop plants and animals through selective breeding. However, the modern science of genetics, which seeks to understand the mechanisms of inheritance, only began with the work of Gregor Mendel in the mid-nineteenth century.^[3] Although he did not know the physical basis for heredity, Mendel observed that inheritance is fundamentally a discrete process where specific traits are inherited in an independent manner—these basic units of inheritance are now called genes.

Genes correspond to regions within DNA, a molecule composed of a chain of four different types of nucleotides—the sequence of this nucleotides is the genetic information organisms inherit. DNA naturally occurs in a double stranded form, with nucleotides on each strand complementary to each other. Each strand can act as a template for synthesis of a new partner strand—this is the physical mechanism for the copying and inheritance of genetic information.

The sequence of nucleotides in DNA is used by cells to produce specific sequences of amino acids, creating proteins—a correspondence known as the genetic code. This sequence of amino acids in a protein determines how it folds into a three-dimensional structure, this structure is, in turn, responsible for the protein’s function. Proteins carry out almost all the functions needed for cells to live and reproduce. A change to DNA sequence can change a protein’s structure and behavior, and this can have dramatic consequences in the cell and on the organism as a whole.

Although genetics plays a large role in determining the appearance and behavior of organisms, it is the interaction of genetics with the environment an organism experiences that determines the ultimate outcome. For example, while genes play a role in determining a person’s height, the nutrition and health that person experiences in childhood also have a large effect.

History

Morgan’s observation of sex-linked inheritance of a mutation causing white eyes in *Drosophila* led him to the hypothesis that genes are located upon chromosomes.

Although the science of genetics has its origins in the work of Gregor Mendel in the mid-nineteenth century, various theories of inheritance preceded Mendel. These theories generally assumed that there existed an inheritance of acquired characteristics: the belief that individuals inherit traits that have been strengthened in their parents. Today, the theory is commonly associated with Jean-Baptiste Lamarck, who used this pattern of inheritance to explain the evolution of various traits within species (these changes are now understood to be the product of natural selection rather than a product of soft inheritance).^[4]

Mendelian and classical genetics

The modern science of genetics traces its roots to the observations made by Gregor Johann Mendel, a German-Czech Augustinian monk and scientist who made detailed studies of the nature of inheritance in plants. In his paper “Versuche über Pflanzenhybriden” (“Experiments on Plant Hybridization“), presented in 1865 to the Brunn Natural History Society, Gregor Mendel traced the inheritance patterns of certain traits in pea plants and showed that they could be described mathematically.^[5] Although not all features show these patterns of Mendelian inheritance, his work suggested the utility of the application of statistics to the study of inheritance.

The significance of Mendel’s observations was not understood until early in the twentieth century, after his death, when his research was re-discovered by other scientists working on similar problems. The word genetics itself was coined in 1905 by William Bateson, a significant proponent of Mendel’s work, in a letter to Adam Sedgwick.^[6]^[7] (The adjective genetic, derived from the Greek word genno (γεννώ): to give birth, predates the noun and was first used in a biological sense in 1860.^[8]) Bateson publicly promoted and popularized usage of word genetics to describe the study of inheritance in his inaugural address to the Third International Conference on Plant Hybridization in London, England, in 1906.^[9]

In the decades following rediscovery and popularization of Mendel’s work, numerous experiments sought to elucidate the molecular basis of DNA. In 1910 Thomas Hunt Morgan argued that genes reside on chromosomes, based on observations of a sex-linked white eye mutation in fruit flies.^[10] In 1913 his student Alfred Sturtevant used the phenomenon of genetic linkage and the associated recombination rates to demonstrate and map the linear arrangement of genes upon the chromosome.^[11]

James D. Watson (*pictured*) and Francis Crick resolved the structure of DNA in 1953.

Molecular genetics

Although chromosomes were known to contain genes and composed of both protein and DNA, it was unknown which was critical for heredity or how the process occurred. In 1928, Frederick Griffith published his discovery of the phenomenon of transformation (see Griffith’s experiment); sixteen years later, in 1944, Oswald Theodore Avery, Colin McLeod and Maclyn McCarty used this phenomenon to isolate and identify the molecule responsible for transformation as DNA.^[12] The Hershey-Chase experiment in 1952 identified DNA (rather than protein) as the genetic material of the viruses that infect bacteria, further evidence that DNA was the molecule responsible for inheritance.^[13]

James D. Watson and Francis Crick solved the structure of DNA in 1953, using the X-ray crystallography work of Rosalind Franklin that indicated the molecule had a helical structure.^[14]^[15] Their double-helix model paired a sequence of nucleotides with complementary nucleotides on the other strand.^[16] This structure not only provided a physical explanation for how information is contained within the order of the nucleotides, but also suggested a simple mechanism for duplication—through separation of strands and the reconstruction of partner strands based on the nucleotide sequences. Although the structure explained the process of inheritance, it was still unknown how DNA influenced the behavior of cells. In the following years scientists sought to understand how DNA controls the process of protein production within ribosomes, eventually discovering the transcription of DNA into messenger RNA and cracking the genetic code, which links the nucleotide sequence of messenger RNA to the amino acid sequence of protein.

With this molecular understanding of inheritance, an explosion of research that applied this new knowledge to biology became possible. One early development was chain-termination DNA sequencing in 1977, which enabled the determination of nucleotide sequences on DNA.^[17] In 1983 the polymerase chain reaction, developed by Kary Banks Mullis, provided a simple method for isolating and amplifying segments of DNA.^[18] These and other techniques, through the pooled efforts of the Human Genome Project and parallel private effort by Celera Genomics, culminated in the sequencing of the human genome in 2003.^[19]

Features of inheritance

Discrete inheritance and Mendel’s laws

A Punnett square depicting a cross between two pea plants heterozygous for purple (B) and white (b) blossoms

At its most fundamental level, inheritance in organisms occurs by means of discrete traits, called genes.^[20] This property was first observed by Gregor Mendel, who studied the segregation of heritable traits in pea plants.^[5]^[21] In his experiments studying the trait for flower color, Mendel observed that the flowers of each pea plant were either purple or white—and never an intermediate between the two colors. These different, discrete versions of the same gene are called alleles.

In the case of pea plants, each organism has two alleles of each gene, and the plants inherit one allele from each parent.^[22] Many organisms, including humans, have this pattern of inheritance. Organisms with two copies of the same allele are called homozygous, while organisms with two different alleles are heterozygous.

The set of alleles for a given organism is called its genotype, while the observable trait the organism has is called its phenotype. When organisms are heterozygous, often one allele is called dominant as its qualities dominate the phenotype of the organism, while the other allele is called recessive as its qualities recede and are not observed. Some alleles do not have complete dominance and instead have incomplete dominance by expressing an intermediate phenotype, or codominance by expressing both alleles at once.^[23]

When a pair of organisms reproduce sexually, their offspring randomly inherit one of the two alleles from each parent. These observations of discrete inheritance and the segregation of alleles are collectively known as Mendel’s first law or the Law of Segregation.

Genetic pedigree charts help track the inheritance patterns of traits.

Notation and diagrams

Geneticists use diagrams and symbols to describe inheritance. A gene is represented by a letter (or letters)—the capitalized letter represents the dominant allele and the recessive is represented by lowercase.^[24] Often a “+” symbol is used to mark the usual, non-mutant allele for a gene.

In fertilization and breeding experiments (and especially when discussing Mendel’s) the parents are referred to as the “P” generation and the offspring as the “F1” (first filial) generation. When the F1 offspring mate with each other, the offspring are called the “F2” (second filial) generation. One of the common diagrams used to predict the result of cross-breeding is the Punnett square.

When studying human genetic diseases, geneticists often use pedigree charts to represent the inheritance of traits.^[25] These charts map the inheritance of a trait in a family tree.

Interactions of multiple genes

Human height is a complex genetic trait. Francis Galton‘s data from 1889 shows the relationship between offspring height as a function of mean parent height. While correlated, remaining variation in offspring heights indicates environment is also an important factor in this trait.

Organisms have thousands of genes, and in sexually reproducing organisms assortment of these genes are generally independent of each other. This means that the inheritance of an allele for yellow or green pea color is unrelated to the inheritance of alleles for white or purple flowers. This phenomenon, known as “Mendel’s second law” or the “Law of independent assortment”, means that the alleles of different genes get shuffled between parents to form offspring with many different combinations. (Some genes do not assort independently, demonstrating genetic linkage, a topic discussed later in this article.)

Often different genes can interact in a way that influences the same trait. In the blue-eyed Mary, for example, there exists a gene with alleles that determine the color of flowers: blue or magenta. Another gene, however, controls whether the flowers have color at all: color or white. When a plant has two copies of this white allele, its flowers are white—regardless of whether the first gene has blue or magenta alleles. This interaction between genes is called epistasis, with the second gene epistatic to the first.^[26]

Many traits are not discrete features (eg. purple or white flowers) but are instead continuous features (eg. human height and skin color). These complex traits are the product of interactions of many genes.^[27] The influence of these genes is mediated, to varying degrees, by the environment an organism has experienced. The degree to which an organism’s genes contribute to a complex trait is called heritability.^[28] Measurement of the heritability of a trait is relative, although—in a more variable environment, the environment has a bigger influence on the total variation of the trait. For example, human height is a complex trait with a heritability of 89% in the United States. In Nigeria, however, where people experience a more variable access to good nutrition and health care, height has a heritability of only 62%.^[29]

Molecular basis for inheritance

DNA and chromosomes

The molecular structure of DNA. Bases pair through the arrangement of hydrogen bonding between the strands.

The molecular basis for genes is deoxyribonucleic acid (DNA). DNA is composed of a chain of nucleotides, of which there are four types: adenine (A), cytosine (C), guanine (G), and thymine (T). Genetic information exists in the sequence of these nucleotides, and genes exist as stretches of sequence along the DNA chain.^[30] Viruses are the only exception to this rule—sometimes viruses use the very similar molecule RNA instead of DNA as their genetic material.^[31]

DNA normally exists as a double-stranded molecule, coiled into the shape of a double-helix. Each nucleotide in DNA preferentially pairs with its partner nucleotide on the opposite strand: A pairs with T, and C pairs with G. Thus, in its two-stranded form, each strand effectively contains all necessary information, redundant with its partner strand. This structure of DNA is the physical basis for inheritance: DNA replication duplicates the genetic information by splitting the strands and using each strand as a template for synthesis of a new partner strand.^[32]

Genes are arranged linearly along long chains of DNA sequence, called chromosomes. In bacteria, each cell has a single circular chromosome, while eukaryotic organisms (which includes plants and animals) have their DNA arranged in multiple linear chromosomes. These DNA strands are often extremely long; the largest human chromosome, for example, is about 247 million base pairs in length.^[33] The DNA of a chromosome is associated with structural proteins that organize, compact, and control access to the DNA, forming a material called chromatin; in eukaryotes chromatin is usually composed of nucleosomes, repeating units of DNA wound around a core of histone proteins.^[34] The full set of hereditary material in an organism (usually the combined DNA sequences of all chromosomes) is called the genome.

While haploid organisms have only one copy of each chromosome, most animals and many plants are diploid, containing two of each chromosome and thus two copies of every gene.^[35] The two alleles for a gene are located on identical loci of sister chromatids, each allele inherited from a different parent.

Walther Flemming‘s 1882 diagram of eukaryotic cell division. Chromosomes are copied, condensed, and organized. Then, as the cell divides, chromosome copies separate into the daughter cells.

An exception exists in the sex chromosomes, specialized chromosomes many animals have evolved that play a role in determining the sex of an organism.^[36] In humans and other mammals the Y chromosome has very few genes and triggers the development of male sexual characteristics, while the X chromosome is similar to the other chromosomes and contains many genes unrelated to sex determination. Females have two copies of the X chromosome, but males have one Y and only one X chromosome—this difference in X chromosome copy numbers leads to the unusual inheritance patterns of sex linked disorders.

Reproduction

When cells divide, their full genome is copied and each daughter cell inherits one copy. This is the simplest form of reproduction and is the basis for asexual reproduction. Asexual reproduction can also occur in multicellular organisms, producing offspring that inherit their genome from a single parent. Offspring that are genetically identical to their parents are called clones.

Eukaryotic organisms often use sexual reproduction to generate offspring that contain a mixture of genetic material inherited from two different parents. The process of sexual reproduction generally alternates between forms that contain single copies of the genome (haploid) and double copies (diploid).^[35] Haploid cells fuse and combine genetic material to create a diploid cell with paired chromosomes. Diploid organisms form haploids by dividing, without replicating their DNA, to create daughter cells that randomly inherit one of each pair of chromosomes. Most animals and many plants are diploid for most of their lifespan, with the haploid form reduced to single cell gametes.

Although they do not use the haploid/diploid method of sexual reproduction, bacteria have many methods of acquiring new genetic information. Some bacteria can undergo conjugation, transferring a small circular piece of DNA to another bacterium.^[37] Bacteria can also take up raw DNA fragments found in the environment and integrate them into their genome, a phenomenon known as transformation.^[38] This processes result in horizontal gene transfer, transmitting fragments of genetic information between organisms that would otherwise be unrelated.

Thomas Hunt Morgan‘s 1916 illustration of a double crossover between chromosomes

Recombination and linkage

The diploid nature of chromosomes allows for genes on different chromosomes to assort independently during sexual reproduction, recombining to form new combinations of genes. Genes on the same chromosome would theoretically never recombine, however, were it not for the process of chromosomal crossover. During crossover, chromosomes exchange stretches of DNA, effectively shuffling the gene alleles between the chromosomes.^[39] This process of chromosomal crossover generally occurs during meiosis, a series of cell divisions that creates haploid germ cells that later combine with other germ cells to form child organisms.

The probability of chromosomal crossover occurring between two given points on the chromosome is related to the distance between them. For an arbitrarily long distance, the probability of crossover is high enough that the inheritance of the genes is effectively uncorrelated. For genes that are closer together, however, the lower probability of crossover means that the genes demonstrate genetic linkage—alleles for the two genes tend to be inherited together. The amounts of linkage between a series of genes can be combined to form a linear linkage map that roughly describes the arrangement of the genes along the chromosome.^[40]

Gene expression

Genetic code

A single amino acid change causes hemoglobin to form fibers.

Genes generally express their functional effect through the production of proteins, which are complex molecules responsible for most functions in the cell.^[41] Proteins are chains of amino acids, and the DNA sequence of a gene (through an RNA intermediate) is used to produce a specific protein sequence. Each group of three nucleotides in the sequence, called a codon, corresponds to one of the twenty possible amino acids in protein—this correspondence is called the genetic code.^[42] The flow of information is unidirectional: information is transferred from nucleotide sequences into the amino acid sequence of proteins, but never from protein back into the sequence of DNA—a phenomenon Francis Crick called the central dogma of molecular biology.^[43]

The dynamic structure of hemoglobin is responsible for its ability to transport oxygen within mammalian blood.

The specific sequence of amino acids results in a unique three-dimensional structure for that protein, and the three-dimensional structures of protein are related to their function.^[44]^[45] Some are simple structural molecules, like the fibers formed by the protein collagen. Proteins can bind to other proteins and simple molecules, sometimes acting as enzymes by facilitating chemical reactions within the bound molecules (without changing the structure of the protein itself). Protein structure is dynamic; the protein hemoglobin bends into slightly different forms as it facilitates the capture, transport, and release of oxygen molecules within mammalian blood.

A single nucleotide difference within DNA can cause a single change in the amino acid sequence of a protein. Because protein structures are the result of their amino acid sequences, some changes can dramatically change the properties of a protein by destabilizing the structure or changing the surface of the protein in a way that changes its interaction with other proteins and molecules. For example, sickle-cell anemia is a human genetic disease that results from a single base difference within the coding region for the β-globin section of hemoglobin, causing a single amino acid change that changes hemoglobin’s physical properties.^[46] Sickle-cell versions of hemoglobin stick to themselves, stacking to form fibers that distort the shape of red blood cells carrying the protein. These sickle-shaped cells no longer flow smoothly through blood vessels, having a tendency to clot or degrade, causing the medical problems associated with the disease.

Nature vs. nurture

Although genes contain all the information an organism uses to function, the environment plays an important role in determining the ultimate phenotype—a dichotomy often referred to as “nature vs. nurture“. The phenotype of an organism depends on the interaction of genetics with the environment. One example of this is the case of temperature-sensitive mutations. Often, a single amino acid change within the sequence of a protein does not change its behavior and interactions with other molecules, but it does destabilize the structure. In a high temperature environment, where molecules are moving more quickly and hitting each other, this results in the protein losing its structure and failing to function. In a low temperature environment, however, the protein’s structure is stable and functions normally. This type of mutation is visible in the coat coloration of Siamese cats, where a mutation in an enzyme responsible for pigment production causes it to destabilize and lose function at high temperatures.^[47] The protein remains functional in areas of skin that are colder—legs, ears, tail, and face—and so the cat has dark fur at its extremities.

Siamese cats have a temperature-sensitive mutation in pigment production.

Environment also plays a dramatic role in effects of the human genetic disease phenylketonuria.^[48] The mutation that causes phenylketonuria disrupts the ability of the body to break down the amino acid phenylalanine, causing a toxic build-up of an intermediate molecule that, in turn, causes severe symptoms of progressive mental retardation and seizures. If someone with the phenylketonuria mutation is kept on a strict diet that avoids this amino acid, however, they remain normal and healthy.

Gene regulation

The genome of a given organism contains thousands of genes, but not all these genes need to be active at any given moment. A gene is expressed when it is being transcribed into mRNA (and translated into protein), and there exist many cellular methods of controlling the expression of genes such that proteins are produced only when needed by the cell. Transcription factors are regulatory proteins that bind to the start of genes, either promoting or inhibiting the transcription of the gene.^[49] Within the genome of Escherichia coli bacteria, for example, there exists a series of genes necessary for the synthesis of the amino acid tryptophan. However, when tryptophan is already available to the cell, these genes for tryptophan synthesis are no longer needed. The presence of tryptophan directly affects the activity of the genes—tryptophan molecules bind to the tryptophan repressor (a transcription factor), changing the repressor’s structure such that the repressor binds to the genes. The tryptophan repressor blocks the transcription and expression of the genes, thereby creating negative feedback regulation of the tryptophan synthesis process.^[50]

Transcription factors bind to DNA, influencing the transcription of associated genes.

Differences in gene expression are especially clear within multicellular organisms, where cells all contain the same genome but have very different structures and behaviors due to the expression of different sets of genes. All the cells in a multicellular organism derive from a single cell, differentiating into different cell types in response to external and intercellular signals and gradually establishing different patterns of gene expression to create different behaviors. No single gene is responsible for the development of structures within multicellular organisms, these patterns arise from the complex interactions between many cells.

Within eukaryotes there exist structural features of chromatin that influence the transcription of genes, often in the form of modifications to DNA and chromatin that are stably inherited by daughter cells.^[51] These features are called “epigenetic” because they exist “on top” of the DNA sequence and retain inheritance from one cell generation to the next. Because of epigenetic features, different cell types grown within the same medium can retain very different properties. Although epigenetic features are generally dynamic over the course of development, some, like the phenomenon of paramutation, have multigenerational inheritance and exist as rare exceptions to the general rule of DNA as the basis for inheritance.^[52]

Genetic change

Gene duplication allows diversification by providing redundancy: one gene can mutate and lose its original function without harming the organism.

Mutations

During the process of DNA replication, errors occasionally occur in the polymerization of the second strand. These errors, called mutations, can have an impact on the phenotype of an organism, especially if they occur within the protein coding sequence of a gene. Error rates are usually very low—1 error in every 10–100 million bases—due to the “proofreading” ability of DNA polymerases.^[53]^[54] (Without proofreading error rates are a thousand-fold higher; because many viruses rely on DNA and RNA polymerases that lack proofreading ability they experience higher mutation rates.) Processes that increase the rate of changes in DNA are called mutagenic: mutagenic chemicals promote errors in DNA replication, often by interfering with the structure of base-pairing, while UV radiation induces mutations by causing damage to the DNA structure.^[55] Chemical damage to DNA occurs naturally as well, and cells use DNA repair mechanisms to repair mismatches and breaks in DNA—nevertheless, the repair sometimes fails to return the DNA to its original sequence.

In organisms that use chromosomal crossover to exchange DNA and recombine genes, errors in alignment during meiosis can also cause mutations.^[56] Errors in crossover are especially likely when similar sequences cause partner chromosomes to adopt a mistaken alignment; this makes some regions in genomes more prone to mutating in this way. These errors create large structural changes in DNA sequence—duplications, inversions or deletions of entire regions, or the accidental exchanging of whole parts between different chromosomes (called translocation).

Natural selection and evolution

Mutations produce organisms with different genotypes, and those differences can result in different phenotypes. Many genetic mutations have a negligible effect on an organism’s phenotype, health, and reproductive fitness. Mutations that do have an effect are often deleterious, but occasionally mutations arise that are beneficial in the current environmental context of the organism.

File:Eukaryote tree.svg Population genetics research studies the distributions of these genetic differences within populations and how the distributions change over time.^[57] Changes in the frequency of an allele in a population can be influenced by natural selection, where a given allele’s higher rate of survival and reproduction causes it to become more frequent in the population over time.^[58] Genetic drift can also occur, where chance events lead to random changes in allele frequency.^[59]

Over many generations, the genomes of organisms can change, resulting in the phenomenon of evolution. Mutations and the selection for beneficial mutations can cause a species to evolve into forms that better survive their environment, a process called adaptation.^[60] New species are formed through the process of speciation, a process often caused by geographical separations that allow different populations to genetically diverge.^[61]

As sequences diverge and change during the process of evolution, these differences between sequences can be used as a molecular clock to calculate the evolutionary distance between them.^[62] Genetic comparisons are generally considered the most accurate method of characterizing the relatedness between species, an improvement over the sometimes deceptive comparison of phenotypic characteristics. The evolutionary distances between species can be combined to form evolutionary trees—these trees represent the common descent and divergence of species over time, although they cannot represent the transfer of genetic material between unrelated species (known as horizontal gene transfer and most common in bacteria).

Research and technology

The common fruit fly (*Drosophila melanogaster*) is a popular model organism in genetics research.

Model organisms and genetics

Although geneticists originally studied inheritance in a wide range of organisms, researchers began to specialize in studying the genetics of a particular subset of organisms. The fact that significant research already existed for a given organism would encourage new researchers to choose it for further study, and so eventually a few model organisms became the basis for most genetics research.^[63] Common research topics in model organism genetics include the study of gene regulation and the involvement of genes in development and cancer.

Organisms were chosen, in part, for convenience—short generation times and facile genetic manipulation made some organisms popular genetics research tools. Widely used model organisms include the gut bacterium Escherichia coli, the plant Arabidopsis thaliana, baker’s yeast (Saccharomyces cerevisiae), the nematode Caenorhabditis elegans, the common fruit fly (Drosophila melanogaster), and the common house mouse (Mus musculus).

Medical genetics research

Medical genetics seeks to understand how genetic variation relates to human health and disease.^[64] When searching for an unknown gene that may be involved in a disease, researchers commonly use genetic linkage and genetic pedigree charts to find the location on the genome associated with the disease. At the population level, researchers take advantage of Mendelian randomization to look for locations in the genome that are associated with diseases, a technique especially useful for multigenic traits not clearly defined by a single gene.^[65] Once a candidate gene is found, further research is often done on the same gene (called an orthologous gene) in model organisms. In addition to studying genetic diseases, the increased availability of genotyping techniques has led to the field of pharmacogenetics—studying how genotype can affect drug responses.^[66]

Although it is not an inherited disease, cancer is also considered a genetic disease.^[67] The process of cancer development in the body is a combination of events. Mutations occasionally occur within cells in the body as they divide—while these mutations will not be inherited by any offspring, they can affect the behavior of cells, sometimes causing them to grow and divide more frequently. There are biological mechanisms that attempt to stop this process—signals are given to inappropriately dividing cells that should trigger cell death, but sometimes additional mutations occur that cause cells to ignore these messages. An internal process of natural selection occurs within the body and eventually mutations accumulate within cells to promote their own growth, creating a cancerous tumor that grows and invades various tissues of the body.

*E coli* colonies on a plate of agar, an example of cellular cloning and often used in molecular cloning

In an effort to speed up the progress of medical research, ten individuals made their genetic information, medical histories, ethnic backgrounds, and other phenotypes publicly available on October 21, 2008. The ten individuals, all of whom have at least the equivalent of masters degree in genetics, are the first set of participants in the Harvard based Personal Genome Project]. Project leader George Church hopes to change the public perception that individuals should keep their genetic information private by exploring the consequences of making genetic and medical information publicly accessible. Officials from the project hope to eventually enroll up to 100,000 participants.^[68]

Research techniques

DNA can be manipulated in the laboratory. Restriction enzymes are a commonly used enzyme that cuts DNA at specific sequences, producing predictable fragments of DNA.^[69] The use of ligation enzymes allows these fragments to be reconnected, and by ligating fragments of DNA together from different sources, researchers can create recombinant DNA. Often associated with genetically modified organisms, recombinant DNA is commonly used in the context of plasmids—short circular DNA fragments with a few genes on them. By inserting plasmids into bacteria and growing those bacteria on plates of agar (to isolate clones of bacteria cells), researchers can clonally amplify the inserted fragment of DNA (a process known as molecular cloning). (Cloning can also refer to the creation of clonal organisms, through various techniques.)

DNA can also be amplified using a procedure called the polymerase chain reaction (PCR).^[70] By using specific short sequences of DNA, PCR can isolate and exponentially amplify a targeted region of DNA. Because it can amplify from extremely small amounts of DNA, PCR is also often used to detect the presence of specific DNA sequences.

DNA sequencing and genomics

One of the most fundamental technologies developed to study genetics, DNA sequencing allows researchers to determine the sequence of nucleotides in DNA fragments. Developed in 1977 by Frederick Sanger and coworkers, chain-termination sequencing is now routinely used to sequence DNA fragments.^[71] With this technology, researchers have been able to study the molecular sequences associated with many human diseases. As sequencing has become less expensive and with the aid of computational tools, researchers have sequenced the genomes of many organisms by stitching together the sequences of many different fragments (a process called genome assembly).^[72] These technologies were used to sequence the human genome, leading to the completion of the Human Genome Project in 2003.^[19]

The large amount of sequences available has created the field of genomics, research that uses computational tools to search for and analyze patterns in the full genomes of organisms. Genomics can also be considered a subfield of bioinformatics, which uses computational approaches to analyze large sets of biological data.

References

Alberts B, Johnson A, Lewis J, Raff M, Roberts K, and Walter P (2002). Molecular Biology of the Cell (4th edition ed.). ISBN 0-8153-3218-1.
Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, and Gelbart WM (2000). An Introduction to Genetic Analysis. New York: W.H. Freeman and Company. ISBN 0-7167-3520-2.
Hartl D, Jones E (2005). Genetics: Analysis of Genes and Genomes, 6th edition. Jones & Bartlett. ISBN 0-7637-1511-5.
Lodish H, Berk A, Zipursky LS, Matsudaira P, Baltimore D, and Darnell J (2000). Molecular Cell Biology (4th edition ed.). ISBN 0-7167-3136-3.

Notes

↑ Griffiths et al. (2000)
↑ Hartl D, Jones E (2005)
↑ Weiling F (1991). “Historical study: Johann Gregor Mendel 1822–1884”. American Journal of Medical Genetics. 40 (1): 1–25, discussion 26. PMID 1887835.
↑ Lamarck, J-B (2008). In Encyclopædia Britannica. Retrieved from Encyclopædia Britannica Online on 2008-03-16.
↑ ^5.0 ^5.1 Mendel, GJ (1866). “Versuche über Pflanzen-Hybriden“. Verhandlungen des naturforschenden Vereins Brünn. 4: 3–47. (in English in 1901, J. R. Hortic. Soc. 26: 1–32) translation available online
↑ genetics, n., Oxford English Dictionary, 3rd ed.
↑ Bateson W. “Letter from William Bateson to Alan Sedgwick in 1905”. The John Innes Centre. Retrieved 2008-03-15.
↑ genetic, adj., Oxford English Dictionary, 3rd ed.
↑ Bateson, W (1907). “The Progress of Genetic Research”. In Wilks, W (editor). Report of the Third 1906 International Conference on Genetics: Hybridization (the cross-breeding of genera or species), the cross-breeding of varieties, and general plant breeding. London: Royal Horticultural Society.
Although the conference was titled “International Conference on Hybridisation and Plant Breeding”, Wilks changed the title for publication as a result of Bateson’s speech.
↑ Moore JA (1983). “Thomas Hunt Morgan—The Geneticist”. American Zoologist. 23 (4): 855–865.
↑ Sturtevant AH (1913). “The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association”. Journal of Experimental Biology. 14: 43–59. pdf from Electronic Scholarly Publishing
↑ Avery OT, MacLeod CM, and McCarty M (1944). “Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types: Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III”. Journal of Experimental Medicine. 79 (1): 137–158.35th anniversary reprint available
↑ Hershey AD, Chase M (1952). “Independent functions of viral protein and nucleic acid in growth of bacteriophage”. The Journal of General Physiology. 36: 39–56.
↑ Judson, Horace (1979). The Eighth Day of Creation: Makers of the Revolution in Biology. Cold Spring Harbor Laboratory Press. pp. 51–169. ISBN 0-87969-477-7. Unknown parameter |middle= ignored (help)
↑ Watson JD, Crick FHC (1953). “[[Molecular structure of Nucleic Acids]]: A Structure for Deoxyribose Nucleic Acid” (PDF). Nature. 171 (4356): 737–738. URL–wikilink conflict (help)
↑ Watson JD, Crick FHC (1953). “Genetical Implications of the Structure of Deoxyribonucleic Acid” (PDF). Nature. 171 (4361): 964–967.
↑ Sanger F, Nicklen S, and Coulson AR (1977). “DNA sequencing with chain-terminating inhibitors”. Nature. 74 (12): 5463–5467.
↑ Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA, Arnheim N (1985). “Enzymatic Amplification of β-Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia”. Science. 230 (4732): 1350–1354. PMID 2999980.
↑ ^19.0 ^19.1 “Human Genome Project Information”. Human Genome Project. Retrieved 2008-03-15.
↑ Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Introduction
↑ Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Mendel’s experiments
↑ Griffiths et al. (2000), Chapter 3 (Chromosomal Basis of Heredity): Mendelian genetics in eukaryotic life cycles
↑ Griffiths et al. (2000), Chapter 4 (Gene Interaction): Interactions between the alleles of one gene
↑ Richard W. Cheney. “Genetic Notation”. Retrieved 2008-03-18.
↑ Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Human Genetics
↑ Griffiths et al. (2000), Chapter 4 (Gene Interaction): Gene interaction and modified dihybrid ratios
↑ Mayeux R (2005). “Mapping the new frontier: complex genetic disorders”. The Journal of Clinical Investigation. 115 (6): 1404–1407. PMID 15931374.
↑ Griffiths et al. (2000), Chapter 25 (Quantitative Genetics): Quantifying heritability
↑ Luke A, Guo X, Adeyemo AA, Wilks R, Forrester T, Lowe W Jr, Comuzzie AG, Martin LJ, Zhu X, Rotimi CN, Cooper RS (2001). “Heritability of obesity-related traits among Nigerians, Jamaicans and US black people”. Int J Obes Relat Metab Disord. 25 (7): 1034–1041. Abstract from NCBI
↑ Pearson H (2006). “Genetics: what is a gene?”. Nature. 441 (7092): 398–401. PMID 16724031.
↑ Prescott, L (1993). Microbiology. Wm. C. Brown Publishers. 0-697-01372-3.
↑ Griffiths et al. (2000), Chapter 8 (The Structure and Replication of DNA): Mechanism of DNA Replication
↑ Gregory SG; et al. (2006). “The DNA sequence and biological annotation of human chromosome 1”. Nature. 441: 315–321. free full text available
↑ Alberts et al. (2002), DNA and chromosomes: Chromosomal DNA and Its Packaging in the Chromatin Fiber
↑ ^35.0 ^35.1 Griffiths et al. (2000), Chapter 3 (Chromosomal Basis of Heredity): Mendelian genetics in eukaryotic life cycles
↑ Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Sex chromosomes and sex-linked inheritance
↑ Griffiths et al. (2000), Chapter 7 (Gene Transfer in Bacteria and Their Viruses): Bacterial conjugation
↑ Griffiths et al. (2000), Chapter 7 (Gene Transfer in Bacteria and Their Viruses): Bacterial transformation
↑ Griffiths et al. (2000), Chapter 5 (Basic Eukaryotic Chromosome Mapping): Nature of crossing-over
↑ Griffiths et al. (2000), Chapter 5 (Basic Eukaryotic Chromosome Mapping): Linkage maps
↑ Some genes are transcribed into RNA, but their RNA products are never used to produce protein. These RNA products may fold into forms with enzymatic properties (eg. ribosomal RNA and transfer RNA), or they may have a regulatory effect through hybridization interactions with other RNA molecules (eg. microRNA).
↑ Berg JM, Tymoczko JL, Stryer L, Clarke ND (2002). Biochemistry (5th edition ed.). New York: W. H. Freeman and Company. I. 5. DNA, RNA, and the Flow of Genetic Information: Amino Acids Are Encoded by Groups of Three Bases Starting from a Fixed Point
↑ Crick, F (1970): Central Dogma of Molecular Biology (PDF). Nature 227, 561–563. PMID 4913914
↑ Alberts et al. (2002), Proteins: The Shape and Structure of Proteins
↑ Alberts et al. (2002), Proteins: Protein Function
↑ “How Does Sickle Cell Cause Disease?”. Brigham and Women’s Hospital: Information Center for Sickle Cell and Thalassemic Disorders. 2002-04-11. Retrieved 2007-07-23.
↑ Imes DL, Geary LA, Grahn RA, Lyons LA (2006). “Albinism in the domestic cat (Felis catus) is associated with a tyrosinase (TYR) mutation” (Short Communication). Animal Genetics. 37 (2): 175. Retrieved 2006-05-29.
↑ “MedlinePlus: Phenylketonuria”. NIH: National Library of Medicine. Retrieved 2008-03-15.
↑ Brivanlou AH, Darnell JE Jr (2002). “Signal transduction and the control of gene expression”. Science. 295 (5556): 813–818. PMID 11823631.
↑ Alberts et al. (2002), Control of Gene Expression – The Tryptophan Repressor Is a Simple Switch That Turns Genes On and Off in Bacteria
↑ Jaenisch R, Bird A. “Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals”. Nature Genetics. 33 (3s): 245–254.
↑ Chandler VL (2007). “Paramutation: From Maize to Mice”. Cell. 128: 641–645.
↑ Griffiths et al. (2000), Chapter 16 (Mechanisms of Gene Mutation): Spontaneous mutations
↑ Kunkel TA (2004). “DNA Replication Fidelity”. Journal of Biological Chemistry. 279 (17): 16895–16898.
↑ Griffiths et al. (2000), Chapter 16 (Mechanisms of Gene Mutation): Induced mutations
↑ Griffiths et al. (2000), Chapter 17 (Chromosome Mutation I: Changes in Chromosome Structure): Introduction
↑ Griffiths et al. (2000), Chapter 24 (Population Genetics): Variation and its modulation
↑ Griffiths et al. (2000), Chapter 24 (Population Genetics): Selection
↑ Griffiths et al. (2000), Chapter 24 (Population Genetics): Random events
↑ Darwin, Charles (1859). On the Origin of Species (1st ed.). London: John Murray. p. 1.. Related earlier ideas were acknowledged in Darwin, Charles (1861). On the Origin of Species (3rd ed.). London: John Murray. pp. xiii.
↑ Gavrilets S (2003). “Perspective: models of speciation: what have we learned in 40 years?”. Evolution. 57 (10): 2197–2215. PMID 14628909.
↑ Wolf YI, Rogozin IB, Grishin NV, Koonin EV (2002). “Genome trees and the tree of life”. Trends Genet. 18 (9): 472&ndash, 479. PMID 12175808.
↑ “The Use of Model Organisms in Instruction”. University of Wisconsin: Wisconsin Outreach Research Modules. Retrieved 2008-03-15.
↑ “NCBI: Genes and Disease”. NIH: National Center for Biotechnology Information. Retrieved 2008-03-15.
↑ Smith GD, Ebrahim S (2003). “‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?”. International Journal of Epidemiology. 32: 1–22.
↑ “Pharmacogenetics Fact Sheet”. NIH: National Institute of General Medical Sciences. Retrieved 2008-03-15.
↑ Strachan T, Read AP (1999). Human Molecular Genetics 2 (second edition ed.). John Wiley & Sons Inc.CS1 maint: Extra text (link) Chapter 18: Cancer Genetics
↑ http://rs6.net/tn.jsp?t=cpxyzscab.0.0.g7akgrbab.0&p=http%3A%2F%2Fwww.nytimes.com%2F2008%2F10%2F20%2Fus%2F20gene.html%3Fpagewanted%3D1%26_r%3D2%26hp&id=preview
↑ Lodish et al. (2000), Chapter 7: 7.1. DNA Cloning with Plasmid Vectors
↑ Lodish et al. (2000), Chapter 7: 7.7. Polymerase Chain Reaction: An Alternative to Cloning
↑ Brown TA (2002). Genomes 2 (2nd edition ed.). ISBN ISBN 1 85996 228 9 Check |isbn= value: invalid character (help).CS1 maint: Extra text (link) Section 2, Chapter 6: 6.1. The Methodology for DNA Sequencing
↑ Brown (2002), Section 2, Chapter 6: 6.2. Assembly of a Contiguous DNA Sequence

External links

Template:Dmoz

Template:Biology-footer

af:Genetika ar:علم الوراثة ast:Xenética az:Genetika bg:Генетика bn:জিনতত্ত্ব bs:Genetika ca:Genètica cs:Genetika da:Genetik de:Genetik diq:Cênetik el:Γενετική eo:Genetiko et:Geneetika fi:Perinnöllisyystiede ga:Géineolaíocht gd:Gintinneachd gl:Xenética he:גנטיקה hr:Genetika hu:Genetika id:Genetika io:Genetiko is:Erfðafræði it:Genetica jbo:ginske ka:გენეტიკა ko:유전학 ku:Genetîk la:Genetica lb:Genetik lt:Genetika lv:Ģenētika mk:Генетика mn:Генетик ms:Genetik nl:Genetica no:Genteknologi nov:Genetike oc:Genetica os:Генетикæ qu:Hinitika sh:Genetika simple:Genetics sk:Genetika sl:Genetika sq:Gjenetika sr:Генетика su:Genetika sv:Genetik ta:மரபியல் tg:Генетика th:พันธุศาสตร์ tl:Henetika uk:Генетика ur:وراثیات uz:Genetika

Template:WikiDoc Sources

Historical

The Doctrine of Chances

Template:WikiDoc Sources

[1] See Ian Hacking‘s The Emergence of Probability for a history of the early development of the very concept of mathematical probability.

[2] Health Measures. Available at https://www.healthmeasures.net/explore-measurement-systems/selecting-among-measurement-systems/compare-measurement-systems

[3] Promis Available at

[4] Promis Measure and Development Research. Available at https://www.healthmeasures.net/explore-measurement-systems/promis/measure-development-research

[pmid20169472-5] Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC (May 2010). “The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study”. Qual Life Res. 19 (4): 539–49. doi:10.1007/s11136-010-9606-8. PMC 2852520. PMID 20169472.

[pmid29942800-6] Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL (2018). “Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer”. Front Public Health. 6: 149. doi:10.3389/fpubh.2018.00149. PMID 29942800.

[7] . “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists” by Joel Best. Professor Best attributes it to Disraeli, rather than Mark Twain or others.

[Simmons_Nelson_Simonsohn_2011_pp._1359–1366-8] Simmons, Joseph P.; Nelson, Leif D.; Simonsohn, Uri (2011-10-17). “False-Positive Psychology”. Psychological Science. SAGE Publications. 22 (11): 1359–1366. doi:10.1177/0956797611417632. ISSN 0956-7976.

[1] Griffiths et al. (2000)

[Hartl_and_Jones-2] Hartl D, Jones E (2005)

[Weiling-3] Weiling F (1991). “Historical study: Johann Gregor Mendel 1822–1884”. American Journal of Medical Genetics. 40 (1): 1–25, discussion 26. PMID 1887835.

[4] Lamarck, J-B (2008). In Encyclopædia Britannica. Retrieved from Encyclopædia Britannica Online on 2008-03-16.

[mendel-5] 5.0 ^5.1 Mendel, GJ (1866). “Versuche über Pflanzen-Hybriden“. Verhandlungen des naturforschenden Vereins Brünn. 4: 3–47. (in English in 1901, J. R. Hortic. Soc. 26: 1–32) translation available online

[6] tics, n., Oxford English Dictionary, 3rd ed.

[7] Bateson W. “Letter from William Bateson to Alan Sedgwick in 1905”. The John Innes Centre. Retrieved 2008-03-15.

[8] tic, adj., Oxford English Dictionary, 3rd ed.

[bateson_genetics-9] Bateson, W (1907). “The Progress of Genetic Research”. In Wilks, W (editor). Report of the Third 1906 International Conference on Genetics: Hybridization (the cross-breeding of genera or species), the cross-breeding of varieties, and general plant breeding. London: Royal Horticultural Society.
Although the conference was titled “International Conference on Hybridisation and Plant Breeding”, Wilks changed the title for publication as a result of Bateson’s speech.

[10] Moore JA (1983). “Thomas Hunt Morgan—The Geneticist”. American Zoologist. 23 (4): 855–865.

[11] Sturtevant AH (1913). “The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association”. Journal of Experimental Biology. 14: 43–59. pdf from Electronic Scholarly Publishing

[Avery_et_al-12] Avery OT, MacLeod CM, and McCarty M (1944). “Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types: Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III”. Journal of Experimental Medicine. 79 (1): 137–158.35th anniversary reprint available

[13] Hershey AD, Chase M (1952). “Independent functions of viral protein and nucleic acid in growth of bacteriophage”. The Journal of General Physiology. 36: 39–56.

[14] Judson, Horace (1979). The Eighth Day of Creation: Makers of the Revolution in Biology. Cold Spring Harbor Laboratory Press. pp. 51–169. ISBN 0-87969-477-7. Unknown parameter |middle= ignored (help)

[watsoncrick_1953a-15] Watson JD, Crick FHC (1953). “[[Molecular structure of Nucleic Acids]]: A Structure for Deoxyribose Nucleic Acid” (PDF). Nature. 171 (4356): 737–738. URL–wikilink conflict (help)

[watsoncrick_1953b-16] Watson JD, Crick FHC (1953). “Genetical Implications of the Structure of Deoxyribonucleic Acid” (PDF). Nature. 171 (4361): 964–967.

[sanger_et_al-17] Sanger F, Nicklen S, and Coulson AR (1977). “DNA sequencing with chain-terminating inhibitors”. Nature. 74 (12): 5463–5467.

[saiki_et_al-18] Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA, Arnheim N (1985). “Enzymatic Amplification of β-Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia”. Science. 230 (4732): 1350–1354. PMID 2999980.

[human_genome_project-19] 19.0 ^19.1 “Human Genome Project Information”. Human Genome Project. Retrieved 2008-03-15.

[20] Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Introduction

[21] Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Mendel’s experiments

[22] Griffiths et al. (2000), Chapter 3 (Chromosomal Basis of Heredity): Mendelian genetics in eukaryotic life cycles

[23] Griffiths et al. (2000), Chapter 4 (Gene Interaction): Interactions between the alleles of one gene

[24] Richard W. Cheney. “Genetic Notation”. Retrieved 2008-03-18.

[25] Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Human Genetics

[26] Griffiths et al. (2000), Chapter 4 (Gene Interaction): Gene interaction and modified dihybrid ratios

[27] Mayeux R (2005). “Mapping the new frontier: complex genetic disorders”. The Journal of Clinical Investigation. 115 (6): 1404–1407. PMID 15931374.

[28] Griffiths et al. (2000), Chapter 25 (Quantitative Genetics): Quantifying heritability

[29] Luke A, Guo X, Adeyemo AA, Wilks R, Forrester T, Lowe W Jr, Comuzzie AG, Martin LJ, Zhu X, Rotimi CN, Cooper RS (2001). “Heritability of obesity-related traits among Nigerians, Jamaicans and US black people”. Int J Obes Relat Metab Disord. 25 (7): 1034–1041. Abstract from NCBI

[Pearson_2006-30] Pearson H (2006). “Genetics: what is a gene?”. Nature. 441 (7092): 398–401. PMID 16724031.

[31] Prescott, L (1993). Microbiology. Wm. C. Brown Publishers. 0-697-01372-3.

[32] Griffiths et al. (2000), Chapter 8 (The Structure and Replication of DNA): Mechanism of DNA Replication

[33] Gregory SG; et al. (2006). “The DNA sequence and biological annotation of human chromosome 1”. Nature. 441: 315–321. free full text available

[34] Alberts et al. (2002), DNA and chromosomes: Chromosomal DNA and Its Packaging in the Chromatin Fiber

[haploid_diploid-35] 35.0 ^35.1 Griffiths et al. (2000), Chapter 3 (Chromosomal Basis of Heredity): Mendelian genetics in eukaryotic life cycles

[36] Griffiths et al. (2000), Chapter 2 (Patterns of Inheritance): Sex chromosomes and sex-linked inheritance

[37] Griffiths et al. (2000), Chapter 7 (Gene Transfer in Bacteria and Their Viruses): Bacterial conjugation

[38] Griffiths et al. (2000), Chapter 7 (Gene Transfer in Bacteria and Their Viruses): Bacterial transformation

[39] Griffiths et al. (2000), Chapter 5 (Basic Eukaryotic Chromosome Mapping): Nature of crossing-over

[40] Griffiths et al. (2000), Chapter 5 (Basic Eukaryotic Chromosome Mapping): Linkage maps

[41] Some genes are transcribed into RNA, but their RNA products are never used to produce protein. These RNA products may fold into forms with enzymatic properties (eg. ribosomal RNA and transfer RNA), or they may have a regulatory effect through hybridization interactions with other RNA molecules (eg. microRNA).

[42] Berg JM, Tymoczko JL, Stryer L, Clarke ND (2002). Biochemistry (5th edition ed.). New York: W. H. Freeman and Company. I. 5. DNA, RNA, and the Flow of Genetic Information: Amino Acids Are Encoded by Groups of Three Bases Starting from a Fixed Point

[crick1970-43] Crick, F (1970): Central Dogma of Molecular Biology (PDF). Nature 227, 561–563. PMID 4913914

[44] Alberts et al. (2002), Proteins: The Shape and Structure of Proteins

[45] Alberts et al. (2002), Proteins: Protein Function

[46] “How Does Sickle Cell Cause Disease?”. Brigham and Women’s Hospital: Information Center for Sickle Cell and Thalassemic Disorders. 2002-04-11. Retrieved 2007-07-23.

[47] Imes DL, Geary LA, Grahn RA, Lyons LA (2006). “Albinism in the domestic cat (Felis catus) is associated with a tyrosinase (TYR) mutation” (Short Communication). Animal Genetics. 37 (2): 175. Retrieved 2006-05-29.

[48] “MedlinePlus: Phenylketonuria”. NIH: National Library of Medicine. Retrieved 2008-03-15.

[49] Brivanlou AH, Darnell JE Jr (2002). “Signal transduction and the control of gene expression”. Science. 295 (5556): 813–818. PMID 11823631.

[50] Alberts et al. (2002), Control of Gene Expression – The Tryptophan Repressor Is a Simple Switch That Turns Genes On and Off in Bacteria

[51] Jaenisch R, Bird A. “Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals”. Nature Genetics. 33 (3s): 245–254.

[52] Chandler VL (2007). “Paramutation: From Maize to Mice”. Cell. 128: 641–645.

[53] Griffiths et al. (2000), Chapter 16 (Mechanisms of Gene Mutation): Spontaneous mutations

[Kunkel-54] Kunkel TA (2004). “DNA Replication Fidelity”. Journal of Biological Chemistry. 279 (17): 16895–16898.

[55] Griffiths et al. (2000), Chapter 16 (Mechanisms of Gene Mutation): Induced mutations

[56] Griffiths et al. (2000), Chapter 17 (Chromosome Mutation I: Changes in Chromosome Structure): Introduction

[57] Griffiths et al. (2000), Chapter 24 (Population Genetics): Variation and its modulation

[58] Griffiths et al. (2000), Chapter 24 (Population Genetics): Selection

[59] Griffiths et al. (2000), Chapter 24 (Population Genetics): Random events

[Darwin-60] Darwin, Charles (1859). On the Origin of Species (1st ed.). London: John Murray. p. 1.. Related earlier ideas were acknowledged in Darwin, Charles (1861). On the Origin of Species (3rd ed.). London: John Murray. pp. xiii.

[Gavrilets-61] Gavrilets S (2003). “Perspective: models of speciation: what have we learned in 40 years?”. Evolution. 57 (10): 2197–2215. PMID 14628909.

[62] Wolf YI, Rogozin IB, Grishin NV, Koonin EV (2002). “Genome trees and the tree of life”. Trends Genet. 18 (9): 472&ndash, 479. PMID 12175808.

[63] “The Use of Model Organisms in Instruction”. University of Wisconsin: Wisconsin Outreach Research Modules. Retrieved 2008-03-15.

[64] “NCBI: Genes and Disease”. NIH: National Center for Biotechnology Information. Retrieved 2008-03-15.

[65] Smith GD, Ebrahim S (2003). “‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?”. International Journal of Epidemiology. 32: 1–22.

[66] “Pharmacogenetics Fact Sheet”. NIH: National Institute of General Medical Sciences. Retrieved 2008-03-15.

[67] Strachan T, Read AP (1999). Human Molecular Genetics 2 (second edition ed.). John Wiley & Sons Inc.CS1 maint: Extra text (link) Chapter 18: Cancer Genetics

[68] ttp://rs6.net/tn.jsp?t=cpxyzscab.0.0.g7akgrbab.0&p=http%3A%2F%2Fwww.nytimes.com%2F2008%2F10%2F20%2Fus%2F20gene.html%3Fpagewanted%3D1%26_r%3D2%26hp&id=preview

[69] Lodish et al. (2000), Chapter 7: 7.1. DNA Cloning with Plasmid Vectors

[70] Lodish et al. (2000), Chapter 7: 7.7. Polymerase Chain Reaction: An Alternative to Cloning

[71] Brown TA (2002). Genomes 2 (2nd edition ed.). ISBN ISBN 1 85996 228 9 Check |isbn= value: invalid character (help).CS1 maint: Extra text (link) Section 2, Chapter 6: 6.1. The Methodology for DNA Sequencing

[72] Brown (2002), Section 2, Chapter 6: 6.2. Assembly of a Contiguous DNA Sequence

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[2]

[3]

[5]

[6]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

List of probability topics

Overview

General aspects

Foundations of probability theory

Overview

Examples

Real-valued random variables

Distribution functions of random variables

Moments

Functions of random variables

Example 1

Example 2

Equivalence of random variables

Equality in distribution

Equality in mean

Almost sure equality

Equality

Convergence

Literature

See also

Properties of probability distributions

See also

External links

Geometric probability

History

Etymology

Origins in probability

Statistics today

Important contributors to statistics

Conceptual overview

Statistical methods

Experimental and observational studies

Experimental studies

Observational studies

Survey development

Levels of measurement

Statistical techniques

Specialized disciplines

Statistical computing

Misuse

P-hacking

See also

Notes

Bibliography

External links

General sites and organizations

Online courses and textbooks

Other resources

Applied statistics

Algorithmics

History

Mendelian and classical genetics

Molecular genetics

Features of inheritance

Discrete inheritance and Mendel’s laws

Notation and diagrams

Interactions of multiple genes

Molecular basis for inheritance

DNA and chromosomes

Reproduction

Recombination and linkage

Gene expression

Genetic code

Nature vs. nurture

Gene regulation

Genetic change

Mutations

Natural selection and evolution

Research and technology

Model organisms and genetics

Medical genetics research

Research techniques

DNA sequencing and genomics

References

Notes

External links

Historical

Looking for the patient version?