Measuring Impact

Submitted by jyuan@worldbank.org on Tue, 04/12/2016 - 18:05
Image
measureimpact

Course Overview

1.What is Evaluation?

2.Measuring Impact

3.Why Randomize?

4.How to Randomize?

5.Sampling and Sample Size

6.Threats and Analysis

7.Cost-Effectiveness Analysis

8.Project from Start to Finish

 

Measuring Impact from clearsateam

Download File
Pdf as plain
Measuring Impact
1. Measuring Impact
2. Course Overview 1. What is Evaluation? 2. Measuring Impact 3. Why Randomize? 4. How to Randomize? 5. Sampling and Sample Size 6. Threats and Analysis 7. Cost-Effectiveness Analysis 8. Project from Start to Finish
3. Women as Policymakers CASE STUDY
4. What was the main purpose of the 73rd Amendment of India’s constitution? A. To reserve leadership positions for women (and caste minorities) B. To formalize local institutions of leadership C. To give women the right to vote in local elections 0% 0% 0% To reserve leadership po... To formalize local institu... To give women the right ..
5. Theory of Change from Reservations to Final Outcomes Public goods show women’s preferences Women are empowered Pradhan’s preferences matter Democracy More female imperfect Pradhans Reservations for women Women have different preferences
6. Data Used Covers All Aspects of Theory of Change Purpose of Measurement Sources of Measurement Indicators Reservations Policy and GP Priorities Administrative Data • Reservation • Budgets • Balance sheets Preferences of Women and Men Transcript from village meeting • Who speaks and when (gender) • Issues raised Public Investments Village Leader Interview • Political experience • Investments undertaken Public Investments Village PRA • Village infrastructure + investments • Perception of public good quality • Participation of men and women Household (HH) Survey • Declared HH preference • HH perceptions of quality of public goods and services
7. Results: Reservations Increase Female Leadership West Bengal Rajasthan Of Pradhans: Reserved Unreserved Reserved Unreserved Total Number 54 107 40 60 Proportion female 100% 6.5% 100% 1.7%
8. Results: Women Raise Different Issues at GP Meetings West Bengal Rajasthan Of Pradhans: Women Men Women Men Drinking water 31% 17% 54% 43% Road improvement 31% 25% 13% 23%
9. Results: Public Investments Show Women’s Preferences West Bengal Rajasthan Issue Investment Issue Reserved Investment Issue Reserved W M W M Investment Drinking Water # facilities 31% 17% 9.09 54% 43% 2.62 Road Road Improvement Condition (0-1) 31% 25% 0.18 13% 23% -0.08 Irrigation # facilities 4% 20% -0.38 2% 4% -0.02 Education Informal education center 6% 12% -0.16 5% 13%
10. Why do You Need Data? Follow Theory of Change  Our theory and hypothesis helps us define the set of outcomes  Need to find indicators that map the outcomes well  Characteristics: Who are the people the program works with, and what is their environment? • Sub-groups, covariates, predictors of compliance  Channels: How does the program work, or fail to work?  Outcomes: What is the purpose of the program? Did it achieve that purpose?
11. Lecture Overview  Theory of Change: What Do You Want to Measure?  Theory of Measurement: What Makes a Good Measure?  Practice of Measurement: Measuring the Immeasurable  Collecting Data
12. Theory of Measurement WHAT MAKES A GOOD MEASURE
13. The Main Challenge in Measurement: Getting Accuracy and Precision More accurate   More precise
14. Terms “Biased” and “Unbiased” Used to Describe Accuracy More accurate  “Biased” “Unbiased ” On average, we get the wrong answer On average, we get the right answer
15. Terms “Noisy” and “Precise” Used to Describe Precision  More precise “Noisy” Random error causes answer to bounce around “Precise” Measures of the same thing cluster together
16. A Noisy and a Precise Measure Can Both Be Biased More accurate   More precise “Noisy” Random error causes answer to bounce around “Precise” Measures of the same thing cluster together
17. Choices in Real Measurement Often Harder More accurate   More precise “Noisy” but “Unbiased” “Precise” but “Biased” Random error causes answer to bounce around Measures of the same thing cluster together
18. The Main Challenge in Measurement: Getting Accuracy and Precision More accurate   More precise
19. Is this introducing noise or bias? A. Noise / Random Error B. Bias A surveyor doesn´t follow the exact phrase of the question: Surveyor- “you feel unsecure in this neighborhood, right?” Respondent- “well, sometimes yes and sometimes, well, no…” Surveyor- “so, is more of a yes, right?” - “mmm…well” Respondent - “ok, thanks. Next question.” code 1=yes 0% 0% Random Error Systematic Error
20. Accuracy  In theory: • How well does the indicator map to the outcome? (e.g. IQ test  intelligence)  In practice: Are you getting unbiased answers? • Respondent bias • Recall bias • Anchoring bias • Social desirability bias (response bias) • Framing effect / Neutrality
21. Precision and Error  In theory: The measure is precise, but not necessarily accurate  In practice: • Length, fatigue • “How much did you spend on broccoli yesterday?” (as a measure of annual broccoli spending) • Ambiguous wording (definitions, relationships, recall period, units of question) Eg. Definition of terms – ‘household’, ‘income’ • Recall period /units of question • Type of answer -Open/Closed • Choice of options for closed questions  Likert (i.e. Strongly disagree, disagree, neither agree nor disagree, . . .)  Rankings  Quantitative scales. Numbers or bins.
22. Random Error  Mistakes in estimation or recall  Question wording  Surveyor training/quality  Data entry  Length, fatigue of survey  How do you generalize from certain questions?
23. Which is worse? A. Bias (Low Accuracy) B. Noise (Low Precision) C. Equally bad D. Depends E. Don’t know/can’t say 0% 0% 0% 0% 0% Noise (Low Precision) Bias (Low Accuracy) Equally bad Depends Don’t know/can’t say
24. A biased measure will bias our impact estimates A. True B. False C. Depends D. Don’t know 0% 0% 0% 0% True False Depends Don’t know
25. Practice of Measurement MEASURING THE UNMEASURABLE
26. What is Hard to Measure? (1) Things people do not know very well (2) Things people do not want to talk about (3) Abstract concepts (4) Things that are not (always) directly observable (5) Things that are best directly observed
27. How much juice did you consume last month? A. <2 liters B. 2-5 liters C. 6-10 liters D. >11 liters 0% 0% 0% 0% <2 liters 2-5 liters 6-10 liters >11 liters
28. 1. Things people do not know very well  What: Anything to estimate, particularly across time. Prone to recall error and poor estimation • Examples: distance to health center, profit, consumption, income, plot size  Strategies: • Frame the question in a way people are likely to know and remember. • Consistency checks – How much did you spend in the last week on x? How much did you spend in the last 4 weeks on x? • Multiple measurements of same indicator – How many minutes does it take to walk to the health center? How many kilometers away is the health center?
29. How many glasses of juice did you consume yesterday A. 0 B. 1-3 C. 4-6 D. >6 0% 0% 0% 0% 0 01-Mar 4-6 >6
30. What is Hard to Measure? (1) Things people do not know very well (2) Things people do not want to talk about (3) Abstract concepts (4) Things that are not (always) directly observable (5) Things that are best directly observed
31. How frequently do you yell at your spouse A. Daily B. Several times per week C. Once per week D. Once per month E. Never 0% 0% 0% 0% 0% Daily Several times per week Once per week Once per month Never
32. 2. Things people don’t want to talk about  What: Anything socially “risky” or something painful • Examples: sexual activity, alcohol and drug use, domestic violence, conduct during wartime, mental health  Strategies: • Don’t start with the hard stuff! • Consider asking question in third person • Always ensure comfort and privacy of respondent
33. Asking Sensitive Questions: List Randomization  Randomly ask part of the sample the question with / without a sensitive option  Response only a count, not the specific options How many of these statements are true for you?  This morning I took a shower.  My nearest bank branch office is within walking distance.  I have tea every day.  I use my loan for non-business expenses. How many of these statements are true for you?  This morning I took a shower.  My nearest bank branch office is within walking distance.  I have tea every day.
34. Asking Sensitive Questions: List Randomization 2.8 – 2.1 = 0.7 full TRUE difference (on average) 70% used their loan for non-business purposes. Average number of true statements: 2.1 Average number of true statements: 2.8
35. List Randomization Shows Big Differences in Some Real Cases
36. Asking Sensitive Questions: List Randomization  Some questions are sensitive and people are hesitant to answer truthfully  List randomization is a way to get the answer on average without knowing confidential information on any one person
37. What is Hard to Measure? (1) Things people do not know very well (2) Things people do not want to talk about (3) Abstract concepts (4) Things that are not (always) directly observable (5) Things that are best directly observed 37
38. “I feel more empowered now than last year” A. Strongly disagree B. Disagree C. Neither agree nor disagree D. Agree E. Strongly agree 0% 0% 0% 0% 0% Disagree Neither agree nor disagree Strongly disagree Agree Strongly agree
39. 3. Abstract concepts  What: Potentially the most challenging and interesting type of difficult-to-measure indicators  Examples: empowerment, bargaining power, social cohesion, risk aversion  Strategies: • Three key steps when measuring “abstract concepts” • Define what you mean by your abstract concept • Choose the outcome that you want to serve as the measurement of your concept • Design a good question to measure that outcome  Often choice between choosing a self-reported measure and a behavioral measure – both can add value!
40. “I am involved in the decision to send my child to private vs public school” A. Strongly disagree B. Disagree C. Neither agree nor disagree D. Agree E. Strongly agree 0% 0% 0% 0% 0% Disagree Neither agree nor disagree Strongly disagree Agree Strongly agree
41. How “Socially Connected" do You Feel to the Other People in this Room? A B C D E 20% 20% 20% 20% 20% A B C D E You Everyone else in this room
42. How likely are you to take a taxi with a driver you don’t know after dark? A. Very unlikely B. Unlikely C. Likely D. Very likely 0% 0% 0% 0% Very unlikely Unlikely Likely Very likely
43. How likely are you to take a taxi with a driver you don’t know after dark? A. Very unlikely B. Unlikely C. Likely D. Very likely 0% 0% 0% 0% Very unlikely Unlikely Likely Very likely
44. Perceptions and Attitudes  Ask directly • “How effective is your leader?” (ineffective, somewhat effective, effective, very…)  Indirect approaches often have better accuracy • Listen to a Vignette (Male v. Female) • Revealed preference – voting behavior • Implicit Association tests
45. Implicit Association Test  People simplify the world for efficiency • Use thumb rules to draw connections • May not even be aware themselves  For some important outcomes, may be worth trying to measure these indirectly • Implicit association one technique
46. Implicit Association Test: Match on Left or Right?
47. Implicit Association Test: Match on Left or Right? Parliament
48. Implicit Association Test: Match on Left or Right? Home
49. Implicit Association Test: Match on Left or Right? Parliament
50. Implicit Association Test  People simplify the world for efficiency • Use thumb rules to draw connections • May not even be aware themselves  Actually based on response time, not accuracy • Are respondents faster to select “Parliament” when associated with a man than a woman?
51. What is Hard to Measure? (1) Things people do not know very well (2) Things people do not want to talk about (3) Abstract concepts (4) Things that are not (always) directly observable (5) Things that are best directly observed 51
52. What proportion race X people are denied jobs due to racial discrimination? A. 0% B. 1-20% C. 21-40% D. 41-60% E. >60% 0% 0% 0% 0% 0% 0% 1-20% 21-40% 41-60% >60%
53. 4. Things that aren’t Directly Observable  What: You may want to measure outcomes that you can’t ask directly about or directly observe • Examples: corruption, fraud, discrimination  Strategies: • Audit studies, e.g. Rajasthan police experiment to register cases, Delhi Driver’s Licenses, Delhi doctors • Multiple sources of data, e.g. inputs of funds vs. outputs received by recipients, pollution reports by different parties • Don’t worry – there have already been lots of clever people before you – so do literature reviews!
54. 5. Things that are Best Directly Observed  What: Behavioral preferences, anything that is more believable when done than said  Strategies: • Develop detailed protocols • Ensure data collection of behavioral measures done under the same circumstances for all individuals • Example: how often do you wash your hands?  Strategy: collect the data directly while disrupting participants’ lives as little as possible  E.g., put movement sensors in soap, measure the weight of the soap
55. The Problem  With the following questions…
56. Question: After the last time you used the toilet, did you wash your hands? (The problem with this indicator is….) A. Accuracy B. Precision C. Both D. Neither 0% 0% 0% 0% Accuracy Precision Both Neither
57. Outcome: Gender Bias Question: How effective are women leaders? (ineffective, somewhat effective, effective, very…) A. Accuracy B. Precision C. Both D. Neither 0% 0% 0% 0% Accuracy Precision Both Neither
58. Examples  Study: Double-Fortified Salt (Duflo et al. forthcoming) • Where: Bihar, India • Intervention: selling low-cost iron-fortified salt to at-risk-for-anemia families. • Indicators: Physical fitness (directly observable; step test) Physical development (directly observable; anthro measures) Cognitive development (indirectly observable; puzzles, tests) Health history (indirectly observable; surveying on immunizations received, etc.) 58
59. COLLECTING DATA
60. When to Collect Data  [ Baseline ] : Before you start  During the intervention  End line : After you’re done  [ Scale-up, intervention ]
61. Methods of Data Collection: Not Just Surveys  Surveys- household/individual  Administrative data  Logs/diaries  Qualitative – eg. focus groups, RRA  Games and choice problems  Observation  Health/Education tests and measures
62. Where can we get Data? The good . . . . and the bad Administrative data • Collected by a government or similar body as part of operations • May already be collected and thus free • Can be extremely accurate (e.g. electricity bills) • May not exist or not answer the question you want • May itself change behavior (e.g. taxes) Other secondary data • Collected for research or other purposes not admin • May already be collected and thus free • Can inform the larger context of a project • May not exist or not answer the question you want • Dubious quality Primary data • Collected by researchers for study • Address the exact question of interest • Cover channels and assumptions • Very costly and time consuming • May be biased answers
63. Primary Data Collection  Surveys • Questions • Exams, tests, games, vignettes, etc.  Direct Observation • Personal or technological (e.g. smoke sensor, vibration sensor)  Diaries/Logs
64. Common Survey Modules Can be Adapted for a Particular Project  Demographics  Economic • Income, consumption, expenditure, time use • Yields, production, etc.  Beliefs • Expectations or assumptions • Bargaining power, patience, risk  Anthropometric  Cognitive, Learning
65. Primary Data Collection Considerations  Quality • Reliability and validity of the data  Costs / Logistics • Surveyor recruitment and training • Field work and transport, interview time • Electronic vs. paper • Data entry, reconciliation, cleaning, etc.  Ethics  Human subjects, data security
66. Reliability of Data Collection  The process of collecting “good” data requires a lot of efforts and thought  Need to make sure that the data collected is precise and accurate.  avoid false conclusions, for any research design  The survey process:  Design of questionnaire  Survey printed on paper/electronic  filled in by enumerator interviewing the respondent  data entry  electronic dataset  Where can this go wrong?
67. Things to Take-away :  Theory of change guides measurement • Want to measure each step  Data collection all about trade-offs • Quality and cost • Bias (accuracy) and noise (precision)  Creative techniques can sometimes help • Think about what outcomes are most important
68. Thank you!
Date
Regional Center Tag
Resource Type Tag
Language Type Tag
Image Thumb
3