Math Portal
Introductory Statistics
Section 3.1 - Designing a Study
Observational vs. Experimental
In an observational study we observe subjects and measure the variables of interest, explanatory variables and responses. For example, in studying the effects that humans have on the coral reefs, we go to several locations where the level of human habitation varies and then observe the conditions of the coral reefs in those areas. Even though we might observe that coral reefs are less healthy if near highly populated areas than near uninhabited areas, we can't say that humans are causing the damage observed. Observational studies allow us to observe association between an explanatory variable and a response variable, but we cannot use the observational study to establish a cause-and-effect relationship between the explanatory variable and response variable. In order to establish a cause-and-effect relationship, we need to conduct an experiment.
What do these studies tell you?
What do these studies tell you?
A longitudinal study is a research study that involves repeated observations of the same variables over long periods of time. For example, if we want to measure if obesity levels in a population are increasing or decreasing, we could draw a sample of 1,000 people randomly from that population, we could measure their weight and height on the same day over a period of fifteen years. This longitudinal sample provides us with information of that population over the course of time and we could determine if the levels are increasing or decreasing.
A cross sectional study, or a cross section of a population, involves collecting data by observing many subjects at the same point of time, or without regard to differences in time. For example, if we want to measure current obesity levels in a population, we could draw a sample of 1,000 people randomly from that population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of that sample is categorized as obese. This cross-sectional sample provides us with a snapshot of that population, at that one point in time. Note that we do not know based on one cross-sectional sample if obesity is increasing or decreasing.
Identify the Variable of Interest
It is important to carefully define the variables to be studied and find appropriate methods to collect the data. Are you looking only at one variable? If so, is it numerical in nature or categorical? If it's numerical, is it continuous or discrete? Are you looking for a relationship between two variables? If so, you need to define the explanatory variable and the response variable.
Collecting Data
You have to decide if an existing data source is sufficient, or whether new data needs to be collected. If you are using existing data, you need to know how that data was collected so that any limitations on use in your study can be identified. Did the data collectors employ methods to minimize the effects of extraneous factors? Were there potential confounding variables? If you are collecting new data, you need to consider how you will go about collecting and recording that data so that it will be both valid and reliable.
Extraneous Factors and Confounding Variables
Extraneous and confounding variables are variables other than the independent variable which may have an effect on the dependent variable. They are important when designing your experiment because they could potentially alter your results leading to misinterpretation and flawed conclusions! For example: A study shows that there is a statistical relationship between ice-cream consumption and number of drowning deaths for a given period. These two variables have a positive correlation with each other. An evaluator might attempt to explain this correlation by inferring a causal relationship between the two variables (either that ice-cream causes drowning, or that drowning causes ice-cream consumption). However, a more likely explanation is that the relationship between ice-cream consumption and drowning is spurious and that a third, confounding, variable (the season) influences both variables: during the summer, warmer temperatures lead to increased ice-cream consumption as well as more people swimming and thus more drowning deaths.
Observational vs. Experimental
In an observational study we observe subjects and measure the variables of interest, explanatory variables and responses. For example, in studying the effects that humans have on the coral reefs, we go to several locations where the level of human habitation varies and then observe the conditions of the coral reefs in those areas. Even though we might observe that coral reefs are less healthy if near highly populated areas than near uninhabited areas, we can't say that humans are causing the damage observed. Observational studies allow us to observe association between an explanatory variable and a response variable, but we cannot use the observational study to establish a cause-and-effect relationship between the explanatory variable and response variable. In order to establish a cause-and-effect relationship, we need to conduct an experiment.
What do these studies tell you?
- Observational studies are usually flexible and do not necessarily need to be structured around a hypothesis about what you expect to observe (data is emergent rather than pre-existing).
- The researcher is able to collect a depth of information about a particular behavior.
- Can reveal interrelationships among multifaceted dimensions of group interactions.
- You can generalize your results to real life situations.
- Observational research is useful for discovering what variables may be important before applying other methods like experiments.
- Observation research designs account for the complexity of group behaviors.
- Reliability of data is low because seeing behaviors occur over and over again may be a time consuming task and difficult to replicate.
- In observational research, findings may only reflect a unique sample population and, thus, cannot be generalized to other groups.
- There can be problems with bias as the researcher may only "see what they want to see."
- There is no possibility to determine "cause and effect" relationships since nothing is manipulated.
- Sources or subjects may not all be equally credible.
- Any group that is studied is altered to some degree by the very presence of the researcher, therefore, skewing to some degree any data collected (the Heisenburg Uncertainty Principle).
What do these studies tell you?
- Experimental research allows the researcher to control the situation. In so doing, it allows researchers to answer the question, “what causes something to occur?”
- Permits the researcher to identify cause and effect relationships between variables and to distinguish placebo effects from treatment effects.
- Experimental research designs support the ability to limit alternative explanations and to infer direct causal relationships in the study.
- Approach provides the highest level of evidence for single studies.
- The design is artificial, and results may not generalize well to the real world.
- The artificial settings of experiments may alter subject behaviors or responses.
- Experimental designs can be costly if special equipment or facilities are needed.
- Some research problems cannot be studied using an experiment because of ethical or technical reasons.
- Difficult to apply ethnographic and other qualitative methods to experimental designed research studies.
A longitudinal study is a research study that involves repeated observations of the same variables over long periods of time. For example, if we want to measure if obesity levels in a population are increasing or decreasing, we could draw a sample of 1,000 people randomly from that population, we could measure their weight and height on the same day over a period of fifteen years. This longitudinal sample provides us with information of that population over the course of time and we could determine if the levels are increasing or decreasing.
A cross sectional study, or a cross section of a population, involves collecting data by observing many subjects at the same point of time, or without regard to differences in time. For example, if we want to measure current obesity levels in a population, we could draw a sample of 1,000 people randomly from that population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of that sample is categorized as obese. This cross-sectional sample provides us with a snapshot of that population, at that one point in time. Note that we do not know based on one cross-sectional sample if obesity is increasing or decreasing.
Identify the Variable of Interest
It is important to carefully define the variables to be studied and find appropriate methods to collect the data. Are you looking only at one variable? If so, is it numerical in nature or categorical? If it's numerical, is it continuous or discrete? Are you looking for a relationship between two variables? If so, you need to define the explanatory variable and the response variable.
Collecting Data
You have to decide if an existing data source is sufficient, or whether new data needs to be collected. If you are using existing data, you need to know how that data was collected so that any limitations on use in your study can be identified. Did the data collectors employ methods to minimize the effects of extraneous factors? Were there potential confounding variables? If you are collecting new data, you need to consider how you will go about collecting and recording that data so that it will be both valid and reliable.
Extraneous Factors and Confounding Variables
Extraneous and confounding variables are variables other than the independent variable which may have an effect on the dependent variable. They are important when designing your experiment because they could potentially alter your results leading to misinterpretation and flawed conclusions! For example: A study shows that there is a statistical relationship between ice-cream consumption and number of drowning deaths for a given period. These two variables have a positive correlation with each other. An evaluator might attempt to explain this correlation by inferring a causal relationship between the two variables (either that ice-cream causes drowning, or that drowning causes ice-cream consumption). However, a more likely explanation is that the relationship between ice-cream consumption and drowning is spurious and that a third, confounding, variable (the season) influences both variables: during the summer, warmer temperatures lead to increased ice-cream consumption as well as more people swimming and thus more drowning deaths.