Straws-inthe-wind , Hoops and Smoking Guns : What can Process Tracing Offer to Impact Evaluation ?

This CDI Practice Paper by Melanie Punton and Katharina Welle explains the methodological and theoretical foundations of process tracing, and discusses its potential application in international development impact evaluations. It draws on two early applications of process tracing for assessing impact in international development interventions: Oxfam Great Britain (GB)’s contribution to advancing universal health care in Ghana, and the impact of the Hunger and Nutrition Commitment Index (HANCI) on policy change in Tanzania. In a companion to this paper, Practice Paper 10 Annex describes the main steps in applying process tracing and provides some examples of how these steps might be applied in practice. CDI Innovation and learning in impact evaluation Centre for Development Impact PRACTICE PAPER


What is process tracing?
Process tracing is part of a wider effort in the social sciences to systematise qualitative methods (Collier 2011), by adopting a generative perspective of causality.The strengths of qualitative methods are that they can assist in explaining how a given input (resource, activity) led to an observed effect (ibid.), an aspect that is often lacking in quantitative methods.Box 1 compares different perspectives on causality.
As a social science research method, process tracing is relatively recent, and its application still requires further development and refinement.It was originally used to provide theoretical explanations of historical events (Falleti 2006).In the social sciences, process tracing has been used by scholars who want to go beyond identifying statistical correlations -for example to better understand the relationship between democracy and peace (Beach and Pedersen 2013).Early contributions to the articulation of process tracing in political science stem from Alexander George and Andrew Bennett (Bennett andGeorge 1997, 2005;Bennett 2008Bennett , 2010)).Process tracing was further elaborated by David Collier (2011) and a recent book by Derek Beach and Rasmus Brun Pedersen (2013) provides a detailed articulation of the theoretical groundings of process tracing as well as step-by-step guidance to its application. 2Beach and Pedersen emphasise the potential PAGE 2 of process tracing as a qualitative method for assessing causal inference through the analysis of causal mechanisms in a single-case design.In our summary of the process tracing methodology we mainly draw on Beach and Pedersen's approach, and discuss its practical application in impact evaluation practice.
Process tracing involves articulating the steps between a hypothesised cause (for example, a development intervention) and an outcome.This involves unpacking the causal mechanism that explains what it is about a cause that leads to an outcome: the causal force or power that links cause A with outcome B (Beach and Pedersen 2013).The concept of 'causal mechanism' is central to the generative framework underpinning process tracing (see Box 1), but can cause confusion as there is no clear consensus in the literature on what exactly a mechanism is (Shaffer 2014).In Beach and Pedersen's description of process tracing, a mechanism is the causal chain or story linking event A with outcome B. A mechanism is made up of a number of 'parts' composed of entities (for example, people, organisations, systems) that engage in activities (for example, protesting, researching, campaigning); and each part of the mechanism is necessary to give rise to the subsequent part.This differs from the definition of mechanism used elsewhere, for example in realist evaluation approaches where a researcher may examine multiple different mechanisms within a single case (Westhorp 2014).
There is also a strong diagnostic element to process tracing.A causal chain linking cause A and outcome B is developed, and Bayesian probability logic is followed in order to assess the strength of the evidence of each part of the chain.Contrary to statistical methods, the quality of the evidence is not judged by sample size (the number of observations) but rather the probability of observing certain pieces of evidence.Assessments of probability in process tracing are not necessarily quantitative.Rather, the nature and assessment of evidence has parallels to a law court: evidence consists of empirical observations combined with knowledge of contextual factors (such as prior knowledge, timing, and the ways in which facts emerge) (Befani and Mayne 2014).The investigator works in a similar way to a detective (literature on process tracing often references Sherlock Holmes), looking for evidence to increase confidence that an outcome was caused in a particular way.Using probability logic, the investigator then systematically assesses the evidence in order to test hypotheses at each stage of the theory, including hypotheses representing alternative causal explanations.
One difference to the workings of a detective is that more than one causal chain may contribute to the effect under investigation (Bennett and George 2005).For example, evidence might give us high confidence that our advocacy intervention caused a policy change, but this does not rule out the possibility that other factors

Box 1 Different perspectives on causality
Counterfactual Based on Mill's method of difference, the counterfactual perspective of causal inference uses a control group to isolate the effect of an intervention.A comparison is made between two otherwise identical cases in which one received an intervention or treatment and the other one did not.This framework is frequently used in clinical trials for medical research.However, counterfactual causal inference does not explain how a specific effect came about.

Regularity
Originating from Mill's method of agreement, regularity frameworks use the frequency of association between two observations to assess an effect.Regularity is the basis for making causal claims in many statistical approaches to evaluation.However, regularity frameworks do not identify the direction of change (which observation is the cause and which the effect), and cannot answer questions about how and why change happens.Their application is also problematic in complex situations where it is difficult to single out specific cause and effect factors.
Configurational Drawing on the concepts of necessity and sufficiency, configurational frameworks describe a number of causes that lead to a specific effect, and identify specific configurations of causal factors that are associated with it.The configurational view of causation recognises that more than one constellation of causes can lead to the same effect, and that similar constellations can lead to different, even opposite effects.Configurational frameworks are used in Qualitative Comparative Analysis.The sets of conditions identified in these frameworks go some way in answering 'how' a specific effect occurred.

Generative
The distinctive feature of generative frameworks is that they provide a detailed description of a causal mechanism that led to a specific effect, and by doing so demonstrate the causal relation.Through a fine-grained explanation of what happens between a cause and an effect, generative mechanisms help to explain 'why' a certain effect occurred.
external to the intervention also contributed to the outcome.This has an important repercussion for the use of process tracing in impact evaluation: it allows for judgements on contribution rather than attribution.

Applying process tracing tests
In assessing the probability that the hypothesised causal chain led to an isolated effect, the investigator compares alternative causal sequences, through: A Reviewing the evidence under the assumption that the hypothesised causal sequence holds: cause A led to outcome B in the theorised way.
B Reviewing the evidence under the assumption that the hypothesised causal sequence does not hold: an alternative causal sequence explains the outcome.
The investigator examines the available evidence to test the inferential weight of evidence for each of these

Applications of process tracing within international development impact evaluation
Impact evaluation designs based on counterfactual and regularity frameworks frequently encounter limitations when applied in the field of international development.
For example, they may not be appropriate to measure initiatives that aim to achieve change through advocacy and policy influence, because the pathways of change are usually unpredictable, highly dependent on changing circumstances and often need to respond to changing goalposts (Tsui, Hearn and Young 2014).
While process tracing has predominantly been applied as a social science research method, the approach is currently being explored in several international development impact evaluations.A recent workshop on process tracing in impact evaluation organised by the Centre for Development Impact (CDI) brought together a number of evaluators who currently apply or intend to apply this method (Barnett and Munslow 2014) An effectiveness review of the campaign was commissioned by Oxfam GB andconducted in 2012-13 (Stedman-Bryce 2013).The review is based on a process tracing protocol developed by Oxfam GB (2011), which incorporates elements of both process tracing and contribution analysis. 4The protocol focuses on elaborating and testing a small number of (not necessarily directly connected) outcomes within a larger project theory of change.It involves three elements: Shortlisting one or more evidenced explanations for the outcome in question (which may or may not include the intervention).
Ruling out alternative competing explanations incompatible with the evidence.
If more than one explanation is supported by the evidence, estimating the level of influence each had on bringing about the change in question.
In line with this protocol, the effectiveness review compares alternative causal sequences and attempts to weight evidence, but it does not explicitly apply process tracing tests.It also does not specify or test a full causal chain.This means that, although the evaluation establishes a degree of confidence in the contribution of the campaign to a number of distinct outcomes, it does not demonstrate whether the programme as a whole contributed to the final outcome.
The evaluation used an existing project theory of change, revised following conversations between the evaluator and project staff.The evaluator stressed that translating project staff's informal understanding about how change happened into an explicit, formal theory was a crucial and challenging aspect of the evaluation.The analysis drew on 21 key informant interviews, mainly with members of the campaign itself and with government representatives.It also drew on project documentation and data (for example, news articles) accessed online.In particular, the evaluator highlighted the value of a Facebook page created by the campaign in helping to reconstruct a timeline of events and in accessing online evidence.
Data were analysed by assessing their explanatory power in relation to two rival causal sequences for the identified outcomes, followed by an assessment of the contribution of each causal sequence to the observed change.

An example of the analytical process used to weigh alternative causal sequences
One of the outcomes examined was: 'the current National Health Insurance Scheme (NHIS) is shown to be an ineffective vehicle to deliver free universal health care in Ghana' (Stedman-Bryce 2013: 4).An important milestone related to this outcome was a highly controversial report published by the campaign, which contended that the CDI PRACTICE PAPER CDI number of people enrolled under the NHIS was inaccurate and needed to be revised downwards.Several months after the report was published, the government department responsible for the NHIS (the National Health Insurance Authority, or NHIA), revised its approach to counting NHIS membership, resulting in a decrease in official statistics on membership from 67 per cent to 34 per cent.
The two rival causal sequences examined in the evaluation were: The methodology revision occurred as a result of pressure exerted by the campaign.
The revisions occurred based on the NHIA's own plans and timetable.
The evidence used to evaluate these alternative causal sequences was as follows: Evidence regarding the level of attention the campaign's report received -for example, quotations from key informant interviews, media articles and blogs (including several published responses by the department refuting the report's claims).The evidence suggested that report did indeed dominate the health sector debate in Ghana for some time.
Testimonies from campaign members affirming that the NHIA revised its methodology based on the public uproar caused by the report.This evidence is enhanced by consideration of the context, particularly the additional pressure on the government -exerted by forthcoming elections -to respond to allegations of corruption and inefficiency.
A statement by the Ghana delegation at an international meeting on Universal Health Care in Geneva, confirming that the campaign's report 'was very helpful and prompted us to revise our figures'.Although the report did not explicitly apply tests, this is a clear example of a 'smoking gun'.It is highly unlikely that the delegation would make this statement if the report had not influenced them, particularly since the NHIA had dismissed the report during the national health sector debate that ensued in Ghana after its publication.The evidence therefore has high uniqueness, and significantly increases confidence in causal sequence 1.
The evaluator then goes on to test the rival causal sequence -that the methodology was revised based on the NHIA's own plans and timetable.He finds that there is no convincing evidence to that end, and infers from the timing of the campaign, and the contestation of any flaws in methodology from the NHIA itself only weeks before its revision, that this rival sequence does not hold.

Case study 2: Framing hunger and nutrition as political commitment: an intervention assessment of HANCI in Tanzania
The Hunger and Nutrition Commitment Index (HANCI) ranks governments in 45 countries on their political commitment to tackling hunger and undernutrition. 5One of HANCI's main aims is to reshape the debate around nutrition, in order to frame the solution to hunger as political rather than purely technical.At the time of writing this paper, researchers at the Institute of Development Studies (IDS) were conducting an intervention assessment of HANCI's policy impact, using process tracing methods.Programme staff selected process tracing for its potential to trace the ways in which HANCI contributed to policy change, given the recognition that change cannot be attributed to a single programme in this complex area.

Specifying the outcome(s)
The assessment aimed to examine the contribution of HANCI to the framing of nutrition policy in Tanzania.
However, specifying what this outcome would look like in empirical terms was difficult, given the fact that the project was ongoing, and that the nature and framing of policy discussions are emergent and unpredictable factors.This means that at the beginning of the assessment, programme staff did not know what the outcome they wanted to test would look like -which complicated data collection.As a result, the theory was eventually split into two outcomes, which will be considered in two separate forthcoming papers.The intermediate outcome was that partners find evidence generated by HANCI credible, and use it in their policy advocacy.The final outcome was that HANCI influenced the framing of nutrition problems and solutions during the drafting of political party manifestos in the run up to elections in September 2015.
Developing the theory HANCI programme staff developed a theory of change to posit how the HANCI intervention was likely to influence the outcomes.From this, a causal mechanism was developed to provide a plausible explanation describing the link between the intervention and the outcome.The mechanism was constructed using an iterative approach, through two parallel processes: A substantial literature review was conducted to identify theories exploring the role of advocacy in promoting policy change.These theories helped inform the creation of the causal mechanism.The literature review also collated empirical evidence from other interventions which supported or challenged the HANCI causal mechanisms.Distilling evidence from the wider literature in this way proved time consuming -given the large number of potentially relevant theories, the indistinct boundaries and overlaps between them, and the subsequent difficulty in classifying evidence in order to use it to support or challenge the HANCI mechanism.

PRACTICE PAPER
Pre-existing evidence (collected throughout the programme) was examined -including workshop reports and surveys, interview transcripts and country learning reports produced by partners.This both helped develop the theory of how HANCI led to change and provided evidence to test this theory.
Applying the four process tracing tests to the causal mechanism proved challenging.The main constraint was that the evidence was vast and included findings at organisational and national levels -for example, strategy documents and national media releases.Obtaining evidence for alternative causal sequences (in which HANCI did not lead to changes in framing at the national level) was also hard to come by, and given the scope of the study (national-level policy change) it was difficult to test and eliminate all possible alternative explanations.The tests were therefore applied in a limited way, in that only empirical observations that passed the 'hoop' test were considered (observations that were necessary for the causal mechanism to hold).

Reflections on the application of process tracing in impact evaluation
This section discusses some of the emerging reflections and lessons on the practical application of process tracing in impact evaluation.

Advantages of process tracing
Process tracing offers a rigorous approach to assess causal change, including through an ex post design without a control group.Time and resources are major obstacles to many organisations wishing to measure the impact of their work.Methods based on a counterfactual causal framework (involving baseline and endline data collection, and comparing change across beneficiary and counterfactual groups) can be time consuming and expensive.Process tracing therefore offers great potential as a rigorous method appropriate for ex post evaluations, without the requirement for baseline or counterfactual data (although it is possible to use both, for example within process tracing tests).
Process tracing offers potential for examining causality in programmes where attribution is difficult.Process tracing focuses on breaking down an intervention into its constituent causal parts, investigating each part and exploring how and why they link together.This approach is particularly relevant to development interventions where the pathways of change are not always predictable or certain; and where multiple processes occur in parallel and so change cannot be easily attributed to a particular cause.
As the case studies demonstrate, policy and advocacy interventions are particularly conducive to this approach.Another advantage of process tracing is that it provides evidence on how and why an intervention led to change.
This is particularly relevant in new or complex interventions in which the causal pathways are not well known.
There is potential to combine aspects of process tracing with other theory-based evaluation approaches.Process tracing can provide a degree of confidence in a particular causal sequence, but cannot demonstrate how important a particular cause was to the achievement of the outcome relative to other causes.In other words, process tracing alone does not provide evidence on the degree or weight of contribution.However, the potential of combining process tracing with contribution analysis is discussed by Befani and Mayne (2014), who argue that the Bayesian logic of inference within process tracing (i.e. the process tracing tests) can complement an assessment of the relative contribution of different causal factors within an overarching contribution analysis framework -resulting in stronger inferences than either process tracing or contribution analysis can provide alone.This is similar to (although more in-depth and systematic than) the method suggested in Oxfam GB's process tracing protocol.
There may also be potential to apply the process tracing tests to systematically and transparently weigh and test qualitative evidence within other theory-based qualitative evaluation approaches, such as realist evaluation.Although time consuming to apply, the tests have the advantage of being fairly intuitive, perhaps given the long exposure many of us have to Sherlock Holmes stories or courtroom dramas.This is demonstrated above in the Oxfam GB case study, in which certain evidence can be retrospectively linked to various tests.

Challenges of process tracing
Process tracing can be time intensive.Developing a causal mechanism takes significant time and may require considerable stakeholder involvement and/or review of secondary literature, as emphasised in the two case studies above.Similarly, collecting the right amount and type of information to construct various tests requires considerable knowledge and understanding of the project, and sufficient capacity and time to analyse the data.For example, a key limitation identified in the Oxfam GB assessment was the small number of interviews with government representatives, given the difficulty of accessing these stakeholders in the time available.This highlights the risk that process tracing may provide inconclusive results if the evidence collected cannot fully support a causal sequence.To thoroughly test alternative hypotheses, the evaluator needs to have access to a range of stakeholders and to published and unpublished material.
There are challenges in applying process tracing where an outcome is not fully known.In Beach and Pedersen's description of process tracing, the outcome is known in advance.This poses challenges in the context of impact evaluation, where the outcome is not known until the end of an (often multi-year) evaluation process and where there are multiple outcomes (which may remain somewhat uncertain), as was the case in the Oxfam GB evaluation.It also proved challenging in the HANCI evaluation, where it took time to specify evidence that might support the outcome of policy change while this change was still unfolding.Despite these challenges, it seems plausible that process tracing could be used as part of a mixed-methods evaluation design, applied at the end or at a mid-point of the evaluation to provide evidence of how and why the intervention led to a particular outcome (which might be established and verified through other methods).
It may also be possible to conduct process tracing alongside an intervention, by developing a causal mechanism before or during the project; and then collecting evidence to test parts in the mechanism as the intervention unfolds.However, this represents a potentially major risk.In complex interventions (such as policy and advocacy initiatives) objectives can be fluid, and the final outcome is frequently quite different to that initially envisaged.A clear causal mechanism may be difficult to develop in a situation where multiple factors combine and accumulate to lead to tipping points; or where feedback loops mean that later events reinforce earlier events and processes (Ramalingam 2013), as was the case in HANCI.This means that it is highly likely that the mechanism developed at the beginning of the project would change over time -and this could mean that data collected during earlier stages of an evaluation is not the right evidence to test the revised mechanism at the end.

Conclusion
Process tracing has major potential as a rigorous method for qualitative impact evaluation, using probability tests based on Bayesian logic of inference to establish confidence in how and why an effect occurred.It may be of particular interest as a method appropriate for ex post evaluations which lack baseline data or a control group -although it certainly does not offer a quick or easy evaluation solution.
The process tracing tests (straw-in-the-wind, hoop, smoking gun and doubly decisive) are a particularly intriguing aspect of the method, drawing on relatively intuitive concepts of uniqueness and certainty to systematically and transparently weigh and test qualitative evidence.So far the applications of process tracing within impact evaluation in international development are limited, although a number of ongoing evaluations are attempting to apply the method.The two case studies discussed in this paper illustrate some of the challenges faced and choices made along the way.There are still unanswered questions around the utility of process tracing within impact evaluation; for example, in relation to evaluating complex interventions, or applying it in circumstances where the outcome is not yet known.However, overall process tracing represents a valuable methodological approach to add to the evaluator's toolbox.(Bennett and Checkel 2014).This was not available to the authors at the time of writing this paper and therefore is not taken into consideration here.

CDI
3 Another example of a discussion on the application of process tracing is a recent journal article by Befani and Mayne (2014).This, however, explores a hypothetical case study that combines process tracing and contribution analysis.

Centre for Development Impact (CDI)
The Centre is a collaboration between IDS (www.ids.ac.uk) and ITAD (www.itad.com).
The Centre aims to contribute to innovation and excellence in the areas of impact assessment, evaluation and learning in development.The Centre's work is presently focused on: (1) Exploring a broader range of evaluation designs and methods, and approaches to causal inference.
(2) Designing appropriate ways to assess the impact of complex interventions in challenging contexts.

4
Contribution analysis assesses the contribution of an intervention to observed results; through verifying the theory of change behind the intervention (using logic, secondary evidence and empirical evidence) and considering the role of other influencing factors (see Befani and Mayne 2014; Mayne 2008).5 See www.hancindex.org and te Lintelo et al. (2014).PAGE 8 Institute of Development Studies, Brighton BN1 9RE, UK T +44 (0) 1273 915637 F +44 (0) 1273 621202 E ids@ids.ac.ukW www.ids.ac.uk (3) Better understanding the political dynamics and other factors in the evaluation process, including the use of evaluation evidence.This CDI Practice Paper was written by Melanie Punton and Katharina Welle.The opinions expressed are those of the author and do not necessarily reflect the views of IDS or any of the institutions involved.Readers are encouraged to quote and reproduce material from issues of CDI Practice Papers in their own publication.In return, IDS requests due acknowledgement and quotes to be referenced as above.© Institute of Development Studies, 2015 ISSN: 2053-0536 AG Level 2 Output ID: 313

CDI PRACTICE PAPER CDI PRACTICE PAPER 10 April 2015 www.ids.ac.uk/cdi PAGE 3 Box 2 Illustrations of the four process tracing tests Straw-in-the-wind test
(low uniqueness, low certainty).This is the weakest of the four tests, neither necessary nor sufficient to confirm a hypothesis.Example hypothesis John shot Mary because he discovered her having an affair.

Evidence constituting this type of test Evidence
that affair was taking place -for example, a hotel receipt, suggestive text messages.

What happens if the hypothesis passes the test (i.e. reliable evidence of this type exists)? The
(Bennett 2010;Collier 201Befani and Mayne 2014)the hypothesis, but this is not enough to conclusively prove it or to disprove alternative hypotheses.sequences.Four 'tests' have been developed to assist with this process: 'straw-in-the-wind' tests, 'hoop' tests, 'smoking gun' tests and 'doubly decisive' tests(Bennett 2010;Collier 2011; Van Evera 1997).These tests are based on the principles of certainty and uniqueness; in other words, whether the tests are necessary and/or sufficient for inferring the evidence.Tests with high uniqueness help to strengthen the confirmatory evidence for a particular hypothesis, by showing that a given piece of evidence was sufficient to confirm it.Tests with high certainty help to rule out alternative explanations by demonstrating that a piece of evidence is necessary for the hypothesis to hold(Beach and Pedersen 2013;Befani and Mayne 2014).The four tests are illustrated in Box 2. In a companion to this paper, CDI Practice Paper 10 Annex describes the main steps in applying process tracing and provides some examples of how these steps might be applied in practice.
However, straw-in-the-wind tests can provide a valuable benchmark, and if a hypothesis passes multiple tests this can add up to important evidence.What happens if the hypothesis fails the test (i.e.reliable evidence of this type does not exist)?This slightly raises doubts about the truth of the hypothesis, but is not enough to rule it out.Hoop test (high certainty: necessary to confirm hypothesis).Example hypothesis John shot Mary.Evidence constituting this type of test John lacks a good alibi for the night of the murder -for example, he claims he was alone.What happens if the hypothesis passes the test?It does not significantly raise the investigator's confidence that the hypothesis is true.John lacking a good alibi is not enough on its own to prove the hypothesis.What happens if the hypothesis fails the test?It disconfirms the hypothesis.If John has a watertight alibi, we can be confident that he did not shoot Mary.Because of this, hoop tests are often used to exclude alternative hypotheses.Smoking gun test (high uniqueness: sufficient to confirm hypothesis).Example hypothesis John shot Mary.Evidence constituting this type of test John was found holding a smoking gun over Mary's body.What happens if the hypothesis passes the test?The investigator can be confident that the hypothesis is true -John did indeed shoot Mary.What happens if the hypothesis fails the test?It does not significantly decrease confidence in the hypothesis.John may have shot Mary and escaped undetected.Doubly decisive test (high certainty, high uniqueness).This is the most demanding test, both necessary and sufficient to confirm a hypothesis.Example hypothesis John shot Mary.Evidence constituting this type of test John was caught on a high-resolution, tamper-proof CCTV camera committing the crime.What happens if the hypothesis passes the test?We can be confident that the hypothesis is true, and that all alternative hypotheses are false.John did indeed shoot Mary.What happens if the hypothesis fails the test?It depends on the nature of the test.If someone else was caught on CCTV committing the crime, it would disconfirm the hypothesis.But if there simply was not a camera, it does nothing to increase or decrease our confidence in the hypothesis.Source: Beach and Pedersen (2013) and Collier (2011).causal Notes1 Impacts are defined by the Organisation for EconomicCo-operation and Development (OECD 2010: 24)as 'positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended'.Impact evaluations attempt to identify a clear link between causes and effects and explain how the intervention worked and for whom. 2 A new book on process tracing was published in November 2014: Process Tracing.From Metaphor to Analytic Tool PRACTICE PAPER CDI PRACTICE PAPER 10 April 2015 www.ids.ac.uk/cdiPAGE 7