02. SPSS Classroom | Basic Statistical Concepts (P2) | Hypotheses, Errors (Type 1/Type 2), P-Value

Q: Why are null and alternative hypotheses designed to be mutually exclusive and exhaustive?

They must cover all possibilities without overlap. In the bulb example, the claim is about whether average life is at least 1,000 hours. That leads to two clean alternatives: either the average is ≥ 1,000 (H0) or it is < 1,000 (H1). There is no “middle” third option, so the test can make a clear decision about which condition is supported by the sample data.

Q: What does it mean that hypothesis testing never directly “accepts” the alternative hypothesis?

The procedure is framed around H0. The outcome is either rejecting H0 (which supports the alternative) or failing to reject H0 (which means the evidence isn’t strong enough to overturn the status quo). This is why the null hypothesis is treated as the default assumption until the data provide sufficient evidence against it.

Q: How do Type I and Type II errors differ, and why does sample size matter?

Type I error happens when H0 is rejected even though it should have been accepted—false alarm. Type II error happens when H0 is not rejected even though it should have been rejected—missed detection. Attempts to reduce one error often increase the other, so the practical solution emphasized is improving sample size, which makes the test more sensitive and more accurate.

Q: What exactly does a p-value represent in hypothesis testing?

The p-value is the probability of obtaining the observed sample results (or more extreme results) assuming there is no true difference. For example, a p-value of 0.05 in a t-test means there’s a 5% chance of seeing the calculated t-statistic if the populations are actually equal. That probability is then compared to a significance threshold (often 0.05 in social sciences).

Q: When should a one-tailed test be used instead of a two-tailed test?

Use a one-tailed test for directional hypotheses that specify the direction of difference or effect (e.g., male job satisfaction > female). Use a two-tailed test for non-directional hypotheses that only ask whether a difference exists without specifying direction (e.g., job satisfaction differs between males and females). The transcript links this to whether H1 is directional (one tail) or non-directional (two tails).

TL;DR

Hypothesis testing decides whether a population claim is supported by sample evidence by rejecting H0 or failing to reject H0.

Briefing Cornell Notes

Briefing

Hypothesis testing is a structured way to decide whether a real-world claim about a population should be accepted or rejected based on sample data. A hypothesis is treated as an educated guess or assumption about a population characteristic—such as an electric bulb company claiming its bulbs last an average of at least 1,000 hours. Testing that claim means collecting data from a sample (e.g., testing 100–200 bulbs by running them for 1,000–2,000 hours) and comparing the results to the population mean implied by the claim.

The decision hinges on setting up two mutually exclusive and exhaustive alternatives: the null hypothesis (H0) and the alternative hypothesis (H1). H0 represents the “status quo” and is presumed correct unless strong evidence appears against it. H1 is the negation of H0. Importantly, hypothesis testing doesn’t “accept” H1 directly; it either rejects H0 or fails to reject H0. In the bulb example, H0 could be “average life is ≥ 1,000 hours,” while H1 would be “average life is < 1,000 hours.” If H0 is rejected, it signals the claim is likely false and corrective action may be needed to improve bulb life. If H0 is not rejected, no corrective action is typically required because the status quo claim still holds.

This framework also formalizes two kinds of mistakes. A Type I error occurs when H0 is rejected even though it should have been accepted. A Type II error occurs when H0 is not rejected even though it should have been rejected. Reducing one error generally increases the risk of the other, so the practical lever is improving sample size—larger samples make the test more reliable.

To make the accept/reject decision, hypothesis testing uses a significance level, commonly expressed through the p-value (P value). The p-value is the probability of reaching the observed (or more extreme) sample results under the assumption that there is actually no true difference. For instance, in a t-test, a p-value of 0.05 means there is only a 5% chance of getting the calculated t-statistic if the two samples truly come from populations that are equal. In social sciences, 0.05 is often used as a standard threshold: if the obtained p-value is less than 0.05, H0 is rejected; if it is greater than 0.05, H0 is not rejected.

Direction matters for how hypotheses are tested. Directional hypotheses specify the direction of an effect (e.g., male job satisfaction is higher than female), so they use a one-tailed test. Non-directional hypotheses only ask whether a difference exists without specifying direction (e.g., job satisfaction differs between males and females), so they use a two-tailed test. Across business, finance, marketing, human resources, quality control, and research, the goal is typically to find statistically significant evidence—often operationalized as p-values below 0.05—to support H1 while controlling the risk of incorrect conclusions.

Cornell Notes

Hypothesis testing turns real-world claims about populations into decisions based on sample data. Each claim is paired with a null hypothesis (H0) representing the status quo and an alternative hypothesis (H1) as its negation; the process results in either rejecting H0 or failing to reject H0. Two errors are possible: Type I (rejecting H0 when it’s true) and Type II (not rejecting H0 when it’s false), and improving sample size helps reduce both tradeoffs. The p-value measures how likely the observed results are if there is truly no difference; in social sciences, p < 0.05 typically leads to rejecting H0. Directional questions use one-tailed tests, while non-directional questions use two-tailed tests.

Why are null and alternative hypotheses designed to be mutually exclusive and exhaustive?

They must cover all possibilities without overlap. In the bulb example, the claim is about whether average life is at least 1,000 hours. That leads to two clean alternatives: either the average is ≥ 1,000 (H0) or it is < 1,000 (H1). There is no “middle” third option, so the test can make a clear decision about which condition is supported by the sample data.

What does it mean that hypothesis testing never directly “accepts” the alternative hypothesis?

The procedure is framed around H0. The outcome is either rejecting H0 (which supports the alternative) or failing to reject H0 (which means the evidence isn’t strong enough to overturn the status quo). This is why the null hypothesis is treated as the default assumption until the data provide sufficient evidence against it.

How do Type I and Type II errors differ, and why does sample size matter?

Type I error happens when H0 is rejected even though it should have been accepted—false alarm. Type II error happens when H0 is not rejected even though it should have been rejected—missed detection. Attempts to reduce one error often increase the other, so the practical solution emphasized is improving sample size, which makes the test more sensitive and more accurate.

What exactly does a p-value represent in hypothesis testing?

The p-value is the probability of obtaining the observed sample results (or more extreme results) assuming there is no true difference. For example, a p-value of 0.05 in a t-test means there’s a 5% chance of seeing the calculated t-statistic if the populations are actually equal. That probability is then compared to a significance threshold (often 0.05 in social sciences).

When should a one-tailed test be used instead of a two-tailed test?