Image

**Course Overview**

1.What is evaluation?

2.Measuring impacts (outcomes, indicators)

3.Why randomize?

4.How to randomize?

**5.Sampling and sample size**

6.Threats and Analysis

7.Cost-Effectiveness Analysis

8.Project from Start to Finish

Download File

Pdf as plain

Sampling, Statistics and Sample Size

1. Sampling, Statistics, Sample Size, Power

2. Course Overview 1. What is evaluation? 2. Measuring impacts (outcomes, indicators) 3. Why randomize? 4. How to randomize? 5. Sampling and sample size 6. Threats and Analysis 7. Cost-Effectiveness Analysis 8. Project from Start to Finish

3. Our Goal in This Lecture: From Sample to Population 1. To understand how samples and populations are related 1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam 2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam We want the sample to tell us something about the overall population Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population 2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between treatment and control

4. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

5. Lesson 1: Basic Statistics To understand how to interpret data, we need to understand three basic concepts: What is a distribution? What’s an average result? What is a standard deviation?

6. What is a Distribution? A distribution graph or table shows each possible outcome and the frequency that we observe that outcome A probability distribution- same as a distribution but converts frequency to probability

7. Baseline Test Scores 500 450 400 350 300 250 200 150 100 50 0 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

8. What’s the Average Result? What is the “expected result”? (i.e. the average)? Expected Result=the sum of all possible values each multiplied by the probability of its occurrence

9. 26 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean Mean = 26

10. Population Population mean Mean=26

11. What’s a Standard Deviation? Standard deviation: Measure of dispersion in the population Weighted average distance to the mean gives more weight to those points furthest from mean.

12. Standard Deviation = 20 26 600 500 400 300 200 100 0 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean 1 Standard Deviation

13. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

14. Our Goal in This Lecture: From Sample to Population 1. To understand how samples and populations are related 1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam 2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam We want the sample to tell us something about the overall population Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population 2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between treatment and control

15. Sampling Variation: Example We want to know the average test score of grade 3 children in Springfield How many children would we need to sample to get an accurate picture of the average test score?

16. Population: Test Scores of all 3rd Graders Population

17. Mean of Population is 26 (true mean) Population Population mean

18. Pick Sample 20 Students: Plot Frequency Population Population mean Sample Sample mean

19. Zooming in on Sample of 20 Students Population mean Sample Sample mean

20. Pick a Different Sample of 20 Students Population mean Sample Sample mean

21. Another Sample of 20 Students Population mean Sample Sample mean

22. Sampling Variation: Definition Sampling variation is the variation we get between different estimates (e.g. mean of test scores) due to the fact that we do not test everyone but only a sample Sampling variation depends on: • The variation in test scores in the underlying population • The number of people we sample

23. What if our Population Instead of Looking Like This… Population Population mean

24. …Looked Like This Population Population mean

25. Standard Deviation: Population 1 Measure of dispersion in the population 1 Standard 1 Standard deviation deviation Population Population mean 1 Standard deviation

26. Standard Deviation: Population II 1 sd 1 sd Population Population mean 1 Standard deviation

27. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

28. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

29. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

30. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

31. Population Population

32. Pick Sample 20 Students: Plot Frequency Population Population mean Sample Sample mean

33. Zooming in on Sample of 20 Students Population mean Sample Sample mean

34. Pick a Different Sample of 20 Students Population mean Sample Sample mean

35. Another Sample of 20 Students Population mean Sample Sample mean

36. Lets Pick a Sample of 50 Students Population mean Sample Sample mean

37. A Different Sample of 50 Students Population mean Sample Sample mean

38. A Third Sample of 50 Students Population mean Sample Sample mean

39. Lets Pick a Sample of 100 Students Population mean Sample Sample mean

40. Lets Pick a Different 100 Students Population mean Sample Sample mean

41. Lets Pick a Different 100 Students- What do we Notice? Population mean Sample Sample mean

42. Law of Large Numbers The more students you sample (so long as it is randomized), the closer most averages are to the true average (the distribution gets “tighter”) When we conduct an experiment, we can feel confident that on average, our treatment and control groups would have the same average outcomes in the absence of the intervention

43. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

44. Central Limit Theorem If we take many samples and estimate the mean many times, the frequency plot of our estimates (the sampling distribution) will resemble the normal distribution This is true even if the underlying population distribution is not normal

45. Population of Test Scores is not Normal Population

46. Take the Mean of One Sample Population Population mean Sample Sample mean

47. Plot That One Mean Population mean Sample Sample mean

48. Take Another Sample and Plot that Mean Population mean Sample Sample mean

49. Repeat Many Times Population mean Sample Sample mean

50. Repeat Many Times Population mean Sample Sample mean

51. Repeat Many Times Sample mean

52. Repeat Many Times Sample mean

53. Sample mean Repeat Many Times

54. Sample mean Repeat Many Times

55. Sample mean Distribution of Sample Means

56. Normal Distribution

57. Central Limit Theorem The more samples you take, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution) This result is INDEPENDENT of the underlying distribution The mean of the distribution of the means will be the same as the mean of the population The standard deviation of the sampling distribution will be the standard error (SE) 푠푒 = 푠푑 2 푛

58. Central Limit Theorem The central limit theorem is crucial for statistical inference Even if the underlying distribution is not normal, IF THE SAMPLE SIZE IS LARGE ENOUGH, we can treat it as being normally distributed

59. THE Basic Questions in Statistics How big does your sample need to be? Why is this the ultimate question? • How confident can you be in your results? We need it to be large enough that both the law of large numbers and the central limit theorem can be applied We need it to be large enough that we could detect a difference in outcome of interest between the treatment and control samples

60. Samples vs Populations We have two different populations: treatment and comparison We only see the samples: sample from the treatment population and sample from the comparison population We will want to know if the populations are different from each other We will compare sample means of treatment and comparison We must take into account that different samples will give us different means (sample variation)

61. Comparison Treatment One Experiment, 2 Samples, 2 Means Comparison mean Treatment mean

62. Difference Between the Sample Means Comparison mean Treatment mean Estimated effect

63. What if we Ran a Second Experiment? Comparison mean Treatment mean Estimated effect

64. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

65. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

66. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

67. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

68. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

69. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

70. What Does This Remind You Of? 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

71. Hypothesis Testing When we do impact evaluations we compare means from two different groups (the treatment and comparison groups) Null hypothesis: the two means are the same and any observed difference is due to chance • H0: treatment effect = 0 Research hypothesis: the true means are different from each other • H1: treatment effect ≠ 0 Other possible tests • H2: treatment effect > 0

72. Distribution of Estimates if True Effect is Zero

73. Distributions Under Two Alternatives

74. We Don’t See These Distributions, Just our Estimate 휷

75. Is Our Estimate 휷 Consistent With the True Effect Being β*?

76. If True Effect is β*, we would get 휷 with Frequency A

77. Is it also Consistent with the True Effect Being 0?

78. If True Effect is 0, we would get 휷 with Frequency A’

79. Q: Which is More Likely, True Effect=β* or True Effect=0?

80. A is Bigger than A’ so True Effect=β* is more Likely that True Effect=0

81. But Can we Rule Out that True Effect=0?

82. Is A’ so Small That True Effect=0 is Unlikely?

83. Probability true effect=0 is area to the right of A’ over total area under the curve

84. Critical Value There is always a chance the true effect is zero, however, large our estimated effect Recollect that, traditionally, if the probability that we would get 훽 if the true effect were 0 is less than 5% we say we can reject that the true effect is zero Definition: the critical value is the value of the estimated effect which exactly corresponds to the significance level If testing whether bigger than 0 a significant at 95% level it is the level of the estimate where exactly 95% of area under the curve lies to the left 훽 is significant at 95% if it is further out in the tail than the critical value

85. 95% Critical Value for True Effect>0

86. In this Case 휷 is > Critical Value So….

87. …..We Can Reject that True Effect=0 with 95% Confidence

88. What if the True Effect=β*?

89. How Often Would we get Estimates that we Could Not Distinguish from 0? (if true effect=β*)

90. How Often Would we get Estimates that we Could Distinguish from 0? (if true effect=β*)

91. Chance of Getting Estimates we can Distinguish from 0 is the Area Under H β* that is above Critical Value for H0

92. Proportion of Area under H β* that is above Critical Value is Power

93. Recap Hypothesis Testing: Power Underlying truth Effective (H0 false) No Effect (H0 true) Statistical Test Significant (reject H0) True positive Probability = (1 – κ) False positive Type I Error (low power) Probability = α Not significant (fail to reject H0) False zero Type II Error Probability = κ True zero Probability = (1-α)

94. Definition of Power Power: If there is a measureable effect of our intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis) Reduce Type II Error: Failing to reject the null hypothesis (concluding there is no difference), when indeed the null hypothesis is false. Traditionally, we aim for 80% power. Some people aim for 90% power

95. More Overlap Between H0 Curve and Hβ* Curve, the Lower the Power. Q: What Effects Overlap?

96. Larger Hypothesized Effect, Further Apart the Curves, Higher the Power

97. Greater Variance in Population, Increases Spread of Possible Estimates, Reduces Power

98. Power Also Depends on the Critical Value, ie level of Significance we are Looking For…

99. 10% Significance Gives Higher Power than 5% Significance

100. Why Does Significance Change Power? Q: what trade off are we making when we chance significance level and increase power? Remember: 10% significance means we’ll make Type I (false positive) errors 10% of the time So moving from 5-10% significance means get more power but at the cost of more false positives Its like widening the gap between the goal posts and saying “now we have a higher chance of getting a goal”

101. Allocation Ratio and Power Definition of allocation ratio: the fraction of the total sample that allocated to the treatment group is the allocation ratio Usually, for a given sample size, power is maximized when half sample allocated to treatment, half to control

102. Why Does Equal Allocation Paximize power? Treatment effect is the difference between two means (mean of treatment and control) Adding sample to treatment group increases accuracy of treatment mean, same for control But diminishing returns to adding sample size If treatment group is much bigger than control group, the marginal person adds little to accuracy of treatment group mean, but more to the control group mean Thus we improve accuracy of the estimated difference when we have equal numbers in treatment and control groups

103. Summary of Power Factors Hypothesized effect size • Q: A larger effect size makes power increase/decrease? Variance • Q: greater residual variance makes power increase/decrease? Sample size • Q: Larger sample size makes power increase/decrease? Critical value • Q: A looser critical value makes power increase/decrease Unequal allocation ration • Q: an unequal allocation ratio makes power increase/decrease? 103

104. Power Equation: MDE 1 P P N Ef fectSize t t 2 1 * 1 * Effect Size Variance Sample Size Significance Level Power Proportion in Treatment

105. Clustered RCT Experiments Cluster randomized trials are experiments in which social units or clusters rather than individuals are randomly allocated to intervention groups The unit of randomization (e.g. the village) is broader than the unit of analysis (e.g. farmers) That is: randomize at the village level, but use farmer-level surveys as our unit of analysis 105

106. Clustered Design: Intuition We want to know how much rice the average farmer in Sierra Leone grew last year Method 1: Randomly select 9,000 farmers from around the country Method 2: Randomly select 9,000 farmers from one district 106

107. Clustered Design: Intuition II Some parts of the country may grow more rice than others in general; what if one district had a drought? Or a flood? • ie we worry both about long term correlations and correlations of shocks within groups Method 1 gives most accurate estimate Method 2 much cheaper so for given budget could sample more farmers What combination of 1 and 2 gives the highest power for given budget constraint? Depends on the level of intracluster correlation, ρ (rho) 107

108. Low Intracluster Correlation Variation in the population Clusters Sample clusters

109. HIGH Intracluster Correlation

110. Intracluster Correlation Total variance can be divided into within cluster variance (휏2) and between cluster variance (σ2) When variance within clusters is small and the variance between clusters is large, the intra cluster correlation is high (previous slide) Definition of intracluster correlation (ICC): the proportion of total variation explained by within cluster level variance • Note, when within cluster variance is high, within cluster correlation is low and between cluster correlation is high 푖푐푐 = 휌 = 휏2 휎2+휏2

111. HIGH Intracluster Correlation

112. Low Intracluster Correlation

113. Power with clustering Effect Size Variance Ef fectSize 2 1 P P N t t m 1 * 1 * 1 ( 1) Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

114. Clustered RCTs vs. Clustered Sampling Must cluster at the level at which you randomize • Many reasons to randomize at group level Could randomize by farmer group, village, district If randomize one district to T and one to C have too little power however many farmers you interview • Can never distinguish treatment effect from possible district wide shocks If randomize at individual level don’t need to worry about within village correlation or village level shocks, as that impacts both T and C 114

115. Bottom Line for Clustering If experimental design is clustered, we now need to consider ρ when choosing a sample size (as well as the other effects) Must cluster at level of randomization It is extremely important to randomize an adequate number of groups Often the number of individuals within groups matter less than the total number of groups 115

116. COMMON TRADEOFFS AND RULES OF THUMB

117. Common Tradeoffs Answer one question really well? Or many questions with less accuracy? Large sample size with possible attrition? Or small sample size that we track very closely? Few clusters with many observations? Or many clusters with few observations? How do we allocate our sample to each group?

118. Rules of Thumb A larger sample is needed to detect differences between two variants of a program than between the program and the comparison group. For a given sample size, the highest power is achieved when half the sample is allocated to treatment and half to comparison. The more measurements are taken, the higher the power. In particular, if there is a baseline and endline rather than just an endline, you have more power The lower compliance, the lower the power. The higher the attrition, the lower the power For a given sample size, we have less power if randomization is at the group level than at the individual level.

1. Sampling, Statistics, Sample Size, Power

2. Course Overview 1. What is evaluation? 2. Measuring impacts (outcomes, indicators) 3. Why randomize? 4. How to randomize? 5. Sampling and sample size 6. Threats and Analysis 7. Cost-Effectiveness Analysis 8. Project from Start to Finish

3. Our Goal in This Lecture: From Sample to Population 1. To understand how samples and populations are related 1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam 2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam We want the sample to tell us something about the overall population Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population 2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between treatment and control

4. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

5. Lesson 1: Basic Statistics To understand how to interpret data, we need to understand three basic concepts: What is a distribution? What’s an average result? What is a standard deviation?

6. What is a Distribution? A distribution graph or table shows each possible outcome and the frequency that we observe that outcome A probability distribution- same as a distribution but converts frequency to probability

7. Baseline Test Scores 500 450 400 350 300 250 200 150 100 50 0 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 test scores frequency

8. What’s the Average Result? What is the “expected result”? (i.e. the average)? Expected Result=the sum of all possible values each multiplied by the probability of its occurrence

9. 26 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency mean Mean = 26

10. Population Population mean Mean=26

11. What’s a Standard Deviation? Standard deviation: Measure of dispersion in the population Weighted average distance to the mean gives more weight to those points furthest from mean.

12. Standard Deviation = 20 26 600 500 400 300 200 100 0 500 450 400 350 300 250 200 150 100 50 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 test scores frequency sd mean 1 Standard Deviation

13. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

14. Our Goal in This Lecture: From Sample to Population 1. To understand how samples and populations are related 1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam 2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam We want the sample to tell us something about the overall population Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population 2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between treatment and control

15. Sampling Variation: Example We want to know the average test score of grade 3 children in Springfield How many children would we need to sample to get an accurate picture of the average test score?

16. Population: Test Scores of all 3rd Graders Population

17. Mean of Population is 26 (true mean) Population Population mean

18. Pick Sample 20 Students: Plot Frequency Population Population mean Sample Sample mean

19. Zooming in on Sample of 20 Students Population mean Sample Sample mean

20. Pick a Different Sample of 20 Students Population mean Sample Sample mean

21. Another Sample of 20 Students Population mean Sample Sample mean

22. Sampling Variation: Definition Sampling variation is the variation we get between different estimates (e.g. mean of test scores) due to the fact that we do not test everyone but only a sample Sampling variation depends on: • The variation in test scores in the underlying population • The number of people we sample

23. What if our Population Instead of Looking Like This… Population Population mean

24. …Looked Like This Population Population mean

25. Standard Deviation: Population 1 Measure of dispersion in the population 1 Standard 1 Standard deviation deviation Population Population mean 1 Standard deviation

26. Standard Deviation: Population II 1 sd 1 sd Population Population mean 1 Standard deviation

27. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

28. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

29. Different Samples of 20 Gives Similar Estimates Population mean Sample Sample mean

30. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

31. Population Population

32. Pick Sample 20 Students: Plot Frequency Population Population mean Sample Sample mean

33. Zooming in on Sample of 20 Students Population mean Sample Sample mean

34. Pick a Different Sample of 20 Students Population mean Sample Sample mean

35. Another Sample of 20 Students Population mean Sample Sample mean

36. Lets Pick a Sample of 50 Students Population mean Sample Sample mean

37. A Different Sample of 50 Students Population mean Sample Sample mean

38. A Third Sample of 50 Students Population mean Sample Sample mean

39. Lets Pick a Sample of 100 Students Population mean Sample Sample mean

40. Lets Pick a Different 100 Students Population mean Sample Sample mean

41. Lets Pick a Different 100 Students- What do we Notice? Population mean Sample Sample mean

42. Law of Large Numbers The more students you sample (so long as it is randomized), the closer most averages are to the true average (the distribution gets “tighter”) When we conduct an experiment, we can feel confident that on average, our treatment and control groups would have the same average outcomes in the absence of the intervention

43. Lecture Outline 1. Basic Statistics Terms 2. Sampling variation 3. Law of large numbers 4. Central limit theorem 5. Hypothesis testing 6. Statistical inference 7. Power

44. Central Limit Theorem If we take many samples and estimate the mean many times, the frequency plot of our estimates (the sampling distribution) will resemble the normal distribution This is true even if the underlying population distribution is not normal

45. Population of Test Scores is not Normal Population

46. Take the Mean of One Sample Population Population mean Sample Sample mean

47. Plot That One Mean Population mean Sample Sample mean

48. Take Another Sample and Plot that Mean Population mean Sample Sample mean

49. Repeat Many Times Population mean Sample Sample mean

50. Repeat Many Times Population mean Sample Sample mean

51. Repeat Many Times Sample mean

52. Repeat Many Times Sample mean

53. Sample mean Repeat Many Times

54. Sample mean Repeat Many Times

55. Sample mean Distribution of Sample Means

56. Normal Distribution

57. Central Limit Theorem The more samples you take, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution) This result is INDEPENDENT of the underlying distribution The mean of the distribution of the means will be the same as the mean of the population The standard deviation of the sampling distribution will be the standard error (SE) 푠푒 = 푠푑 2 푛

58. Central Limit Theorem The central limit theorem is crucial for statistical inference Even if the underlying distribution is not normal, IF THE SAMPLE SIZE IS LARGE ENOUGH, we can treat it as being normally distributed

59. THE Basic Questions in Statistics How big does your sample need to be? Why is this the ultimate question? • How confident can you be in your results? We need it to be large enough that both the law of large numbers and the central limit theorem can be applied We need it to be large enough that we could detect a difference in outcome of interest between the treatment and control samples

60. Samples vs Populations We have two different populations: treatment and comparison We only see the samples: sample from the treatment population and sample from the comparison population We will want to know if the populations are different from each other We will compare sample means of treatment and comparison We must take into account that different samples will give us different means (sample variation)

61. Comparison Treatment One Experiment, 2 Samples, 2 Means Comparison mean Treatment mean

62. Difference Between the Sample Means Comparison mean Treatment mean Estimated effect

63. What if we Ran a Second Experiment? Comparison mean Treatment mean Estimated effect

64. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

65. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

66. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

67. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

68. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

69. Many Experiments Give Distribution of Estimates 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

70. What Does This Remind You Of? 100 90 80 70 60 50 40 30 20 10 0 Frequency -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Difference

71. Hypothesis Testing When we do impact evaluations we compare means from two different groups (the treatment and comparison groups) Null hypothesis: the two means are the same and any observed difference is due to chance • H0: treatment effect = 0 Research hypothesis: the true means are different from each other • H1: treatment effect ≠ 0 Other possible tests • H2: treatment effect > 0

72. Distribution of Estimates if True Effect is Zero

73. Distributions Under Two Alternatives

74. We Don’t See These Distributions, Just our Estimate 휷

75. Is Our Estimate 휷 Consistent With the True Effect Being β*?

76. If True Effect is β*, we would get 휷 with Frequency A

77. Is it also Consistent with the True Effect Being 0?

78. If True Effect is 0, we would get 휷 with Frequency A’

79. Q: Which is More Likely, True Effect=β* or True Effect=0?

80. A is Bigger than A’ so True Effect=β* is more Likely that True Effect=0

81. But Can we Rule Out that True Effect=0?

82. Is A’ so Small That True Effect=0 is Unlikely?

83. Probability true effect=0 is area to the right of A’ over total area under the curve

84. Critical Value There is always a chance the true effect is zero, however, large our estimated effect Recollect that, traditionally, if the probability that we would get 훽 if the true effect were 0 is less than 5% we say we can reject that the true effect is zero Definition: the critical value is the value of the estimated effect which exactly corresponds to the significance level If testing whether bigger than 0 a significant at 95% level it is the level of the estimate where exactly 95% of area under the curve lies to the left 훽 is significant at 95% if it is further out in the tail than the critical value

85. 95% Critical Value for True Effect>0

86. In this Case 휷 is > Critical Value So….

87. …..We Can Reject that True Effect=0 with 95% Confidence

88. What if the True Effect=β*?

89. How Often Would we get Estimates that we Could Not Distinguish from 0? (if true effect=β*)

90. How Often Would we get Estimates that we Could Distinguish from 0? (if true effect=β*)

91. Chance of Getting Estimates we can Distinguish from 0 is the Area Under H β* that is above Critical Value for H0

92. Proportion of Area under H β* that is above Critical Value is Power

93. Recap Hypothesis Testing: Power Underlying truth Effective (H0 false) No Effect (H0 true) Statistical Test Significant (reject H0) True positive Probability = (1 – κ) False positive Type I Error (low power) Probability = α Not significant (fail to reject H0) False zero Type II Error Probability = κ True zero Probability = (1-α)

94. Definition of Power Power: If there is a measureable effect of our intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis) Reduce Type II Error: Failing to reject the null hypothesis (concluding there is no difference), when indeed the null hypothesis is false. Traditionally, we aim for 80% power. Some people aim for 90% power

95. More Overlap Between H0 Curve and Hβ* Curve, the Lower the Power. Q: What Effects Overlap?

96. Larger Hypothesized Effect, Further Apart the Curves, Higher the Power

97. Greater Variance in Population, Increases Spread of Possible Estimates, Reduces Power

98. Power Also Depends on the Critical Value, ie level of Significance we are Looking For…

99. 10% Significance Gives Higher Power than 5% Significance

100. Why Does Significance Change Power? Q: what trade off are we making when we chance significance level and increase power? Remember: 10% significance means we’ll make Type I (false positive) errors 10% of the time So moving from 5-10% significance means get more power but at the cost of more false positives Its like widening the gap between the goal posts and saying “now we have a higher chance of getting a goal”

101. Allocation Ratio and Power Definition of allocation ratio: the fraction of the total sample that allocated to the treatment group is the allocation ratio Usually, for a given sample size, power is maximized when half sample allocated to treatment, half to control

102. Why Does Equal Allocation Paximize power? Treatment effect is the difference between two means (mean of treatment and control) Adding sample to treatment group increases accuracy of treatment mean, same for control But diminishing returns to adding sample size If treatment group is much bigger than control group, the marginal person adds little to accuracy of treatment group mean, but more to the control group mean Thus we improve accuracy of the estimated difference when we have equal numbers in treatment and control groups

103. Summary of Power Factors Hypothesized effect size • Q: A larger effect size makes power increase/decrease? Variance • Q: greater residual variance makes power increase/decrease? Sample size • Q: Larger sample size makes power increase/decrease? Critical value • Q: A looser critical value makes power increase/decrease Unequal allocation ration • Q: an unequal allocation ratio makes power increase/decrease? 103

104. Power Equation: MDE 1 P P N Ef fectSize t t 2 1 * 1 * Effect Size Variance Sample Size Significance Level Power Proportion in Treatment

105. Clustered RCT Experiments Cluster randomized trials are experiments in which social units or clusters rather than individuals are randomly allocated to intervention groups The unit of randomization (e.g. the village) is broader than the unit of analysis (e.g. farmers) That is: randomize at the village level, but use farmer-level surveys as our unit of analysis 105

106. Clustered Design: Intuition We want to know how much rice the average farmer in Sierra Leone grew last year Method 1: Randomly select 9,000 farmers from around the country Method 2: Randomly select 9,000 farmers from one district 106

107. Clustered Design: Intuition II Some parts of the country may grow more rice than others in general; what if one district had a drought? Or a flood? • ie we worry both about long term correlations and correlations of shocks within groups Method 1 gives most accurate estimate Method 2 much cheaper so for given budget could sample more farmers What combination of 1 and 2 gives the highest power for given budget constraint? Depends on the level of intracluster correlation, ρ (rho) 107

108. Low Intracluster Correlation Variation in the population Clusters Sample clusters

109. HIGH Intracluster Correlation

110. Intracluster Correlation Total variance can be divided into within cluster variance (휏2) and between cluster variance (σ2) When variance within clusters is small and the variance between clusters is large, the intra cluster correlation is high (previous slide) Definition of intracluster correlation (ICC): the proportion of total variation explained by within cluster level variance • Note, when within cluster variance is high, within cluster correlation is low and between cluster correlation is high 푖푐푐 = 휌 = 휏2 휎2+휏2

111. HIGH Intracluster Correlation

112. Low Intracluster Correlation

113. Power with clustering Effect Size Variance Ef fectSize 2 1 P P N t t m 1 * 1 * 1 ( 1) Sample Size Significance Level Power Proportion in Treatment ICC Average Cluster Size

114. Clustered RCTs vs. Clustered Sampling Must cluster at the level at which you randomize • Many reasons to randomize at group level Could randomize by farmer group, village, district If randomize one district to T and one to C have too little power however many farmers you interview • Can never distinguish treatment effect from possible district wide shocks If randomize at individual level don’t need to worry about within village correlation or village level shocks, as that impacts both T and C 114

115. Bottom Line for Clustering If experimental design is clustered, we now need to consider ρ when choosing a sample size (as well as the other effects) Must cluster at level of randomization It is extremely important to randomize an adequate number of groups Often the number of individuals within groups matter less than the total number of groups 115

116. COMMON TRADEOFFS AND RULES OF THUMB

117. Common Tradeoffs Answer one question really well? Or many questions with less accuracy? Large sample size with possible attrition? Or small sample size that we track very closely? Few clusters with many observations? Or many clusters with few observations? How do we allocate our sample to each group?

118. Rules of Thumb A larger sample is needed to detect differences between two variants of a program than between the program and the comparison group. For a given sample size, the highest power is achieved when half the sample is allocated to treatment and half to comparison. The more measurements are taken, the higher the power. In particular, if there is a baseline and endline rather than just an endline, you have more power The lower compliance, the lower the power. The higher the attrition, the lower the power For a given sample size, we have less power if randomization is at the group level than at the individual level.

Date

Topic Tag

Regional Center Tag

Resource Type Tag

Language Type Tag

Source Tag

Image Thumb