The Other Pay for Success: The Promise and Peril of Paying for Outcomes

Social impact bonds and pay-for-success are often viewed as synonymous. They shouldn’t be. There is another version of pay-for-success that has been around much longer, is far more widespread, and may be the future of social impact bonds.

Outcomes-based funding – payments to social service and other providers based at least in part on the results they achieve – is an idea that has been around for decades. Unfortunately, the history of this other version of pay-for-success has been far from universally positive.

If the futures of these two very different forms of pay-for-success become increasingly intertwined, it will be important to understand their respective strengths and weaknesses. While the two ideas can be highly synergistic, there is also a substantial risk of recreating the failures of the past.

Two Approaches: Similar, but Different

On the surface, social impact bonds (SIBs) and more traditional outcomes-based funding systems are very similar. Both involve governments paying other (often nonprofit) entities for services that are based at least partly on results.

But there are also a number of important differences. SIBs, at least as they are commonly understood in the United States, typically involve third-party private financing and rely on rigorous external evaluations to determine their impact. By contrast, outcomes-based funding systems usually lack third-party financing and rely on validated administrative data to determine payments.

There is also a significant difference in scale. While SIBs are currently experiencing significant growth, fewer than a dozen are actually operational in the United States and only two have shown any results so far (in Rikers Island and Salt Lake County). By contrast, outcomes-based funding systems are already operating at scale, affecting perhaps hundreds of billions of dollars in annual public funding, including in education, job training, social services, and health care, the last of which by itself comprises a substantial portion of the U.S. economy.

While these differences are significant, they may not be destined to last. As SIBs grow, there are already signs that they may become more like their larger, more established sibling. First, the current financing mechanism for SIBs may not be scalable. Despite the increased availability of social investment capital, the risk-reward ratio may make them financially unattractive to profit-motivated investors, particularly after the failure of the Rikers Island project. Foundation funds, which have provided the bulk of the financing to date, may be too limited to provide sufficient scale. Second, the external impact evaluations used to judge success may be too expensive and impractical once SIBs advance beyond the demonstration stage.

If these limitations prove insurmountable, SIBs may become more like traditional performance-based contracts. At least one national SIB expert foresees such a future, while others have begun to raise questions. If such a convergence does occur, the history and lessons of outcomes-based funding will become increasingly important to understand.

The Promise and Peril of Outcomes-based Funding

At first glance, outcomes-based funding has a lot going for it. It creates financial incentives for greater effort and continuous improvement. It can also induce market effects, allowing better-performing providers to thrive and grow, while lower-performing competitors are forced to improve or close their doors.

By incentivizing greater performance, it can also promote the use of evidence-based practices and help bring evidence-based programs to scale. However, because it is focused on outcomes rather than compliance or process measures, it can also provide organizations the flexibility they need to innovate, make rapid adjustments, and further improve upon those evidence-based programs.

Moreover, unlike social impact bonds, which typically feature complex contractual arrangements, outcomes-based funding systems are usually much simpler. Because incentive payments are commonly structured as bonuses or penalties, they rarely require third-party financing that might inhibit bringing them to scale. When such financing is necessary it is left to the providers themselves, who often have substantial experience negotiating their own lines of credit.

Given these many advantages, it is not difficult to understand why some social impact bond proponents might view outcomes-based funding as a natural next step. Unfortunately, despite its potential, its actual track record has been mixed to poor.

Examples can be drawn from across the policy spectrum:

K-12 Education: Perhaps the most obvious example of an outcomes-focused policy gone wrong is No Child Left Behind, where unrealistic educational outcomes-based metrics have left states scrambling for waivers to escape federally-imposed penalties and left Congress puzzling over how to replace it. One of the principal experiments with outcomes-based funding in K-12 education is merit-based pay, but a recent Mathematica evaluation of the federal Teacher Incentive Fund found that it had negligible effects on educational achievement.
Job Training: A longer track record can be found in the workforce arena, where the Job Training Partnership Act of 1982 arguably launched the modern era of outcomes-based funding. Data from its subsequent national evaluation, however, showed no relationship between the program’s outcome measures and its actual impact on employment and earnings. A similar national evaluation of Job Corps, a well-regarded program that featured bonuses and penalties for high- and low-performing providers, also found no relationship between better-rated providers (as determined by their outcomes) and their true impact.
Health: Earlier this year, the Department of Health and Human Services announced that it plans to tie 85 percent of Medicare payments to quality or value by 2016 through programs such as the Hospital Value Based Purchasing Program. But an interim report released by GAO in October found no change in hospital quality after the program’s first three years. A similar study of the Nursing Home Value Based Purchasing Demonstration found it did not affect quality or lower costs.

Despite these failures, outcomes-based funding has also scored some apparent successes. In both higher education and child welfare, for example, state-initiated performance incentives appear to have driven improvements in outcomes for students and children in foster care, respectively. However, even here the actual impacts are uncertain because neither has been subjected to a rigorous national evaluation.

And therein lies a central problem for these (and probably all) outcomes-based funding initiatives: positive outcomes do not guarantee positive impacts.

What’s the difference? Outcomes can tell you whether a job training program has placed someone in a job. But such outcomes can also be substantially influenced by many other factors (pre-existing job readiness, the state of the local economy, unrelated changes in public policy). By contrast, impacts isolate the value-add of the program itself.

Outcomes and impacts are often unrelated, which is why a program that seems to be producing improved outcomes may actually be producing no impact at all. Even worse, sometimes they run in opposite directions, as can happen when a program works with harder-to-serve populations, resulting in seemingly worse outcomes, but higher value-add and greater impact.

The divergence of outcomes and impact is a primary reason why outcomes-based funding systems fail. Nearly every shortcoming (discussed below) can be traced back to this simple idea.

Why not just measure impact instead of outcomes? Unfortunately, measuring impact on an ongoing basis is neither easy nor practical. It usually requires an expensive evaluation with a comparison group, ideally a randomized controlled trial or similarly credible design. When such studies are done, however, they often undercut the claims of outcomes-based funding proponents.

Outcomes-based Funding 2.0?

Given this sorry track record, what can be done? Is the promise of paying for outcomes simply out of reach? Not necessarily. The history of outcomes-based funding may only be following the usual pattern of innovation with a record of significant failure before finally getting it right.

What would a successful outcomes-based system look like? In short, it must attempt to become a proxy for impact – approximating the value-add of the program while controlling for other effects. As any skeptical evaluator will point out, this is not an easy task.

Nevertheless, a closer look at the history of outcomes-based funding reveals the following relatively consistent set of pitfalls – as well as possible solutions. It is possible that the barriers between outcomes and impact might be overcome by better understanding the obstacles and by better design.

Avoiding Basic Design Flaws: Some outcomes-based funding systems are so poorly designed that they not only fail to drive improvements in impact, they also fail to drive any significant change in outcomes. For example, performance measures and data may be invalid or unreliable. The measures may be too difficult to understand or the financial incentives too low. The intended recipient may also have insufficient control over outcomes (one criticism leveled at No Child Left Behind). Such flaws can often be avoided by soliciting stakeholder input during the initial design phase, a process that can also bring greater stakeholder buy-in.
Avoiding Gaming and Fraud: Some providers, when faced with difficult-to-achieve performance standards, may take the easy way out and game the numbers or engage in outright fraud. It is a phenomenon widely enough known that it has its own name: Campbell’s Law. History suggests that such efforts can be countered through simple awareness, experience, better design, independent data audits, and appropriate criminal and civil penalties. It can also be addressed by recognizing when performance standards are unreasonable – a major cause of gaming and fraud – and providing the necessary capacity-building to make legitimate improvement possible.
Cream Skimming: Cream skimming occurs when program staff intentionally recruit the easiest-to-serve individuals, a process that makes outcomes standards much easier to attain. While cream skimming has received significant attention in the literature, the evidence suggests that it is often countered by strong cultural norms among frontline staff. However, eligibility rules and low program awareness among target populations can often generate similar results even when they are unintentional. Overall, such effects can often be countered by disaggregating the data and providing correspondingly greater financial incentives to work with harder-to-serve subgroups. Incentives that are tied to improvement over historic baselines can also mitigate geographic or population differences.
Value-add Designs: Another way to isolate a program’s impact is to compare outcomes before and after it has been implemented. Studies of teacher effectiveness have shown that such value-add measures can be valid indicators of impact. They may not be practical for every program, however. Moreover, pre-post measures do not alone constitute proof of impact (which is why separate impact studies were needed to validate them for teachers). But where they are feasible, such measures may be a good starting point.
Adding Intermediate Measures: The rhetoric of paying for outcomes often rules out intermediate process or “output” measures, but given the inherent limitations of outcome metrics, such intermediate standards may actually be better aligned with real impact, especially when they are rooted in rigorous evidence. When Congress recently reauthorized the nation’s workforce laws, for example, it instituted a new standard for measurable skill gains for low-income workers to shore up some of the known shortcomings of the existing outcome measures.
Adjusting for External Influences: Outcomes can sometimes be heavily influenced by external factors, such as varying local poverty rates or unemployment. The adoption of intermediate measures (as described above) can help address this. External factors can also sometimes be mitigated by simply comparing the outcomes for similar providers in the same local jurisdiction or serving the same population, all of whom should be subject to the same external effects.
Accounting for Superficial and Short-lived Outcomes: Even when they are real, many outcomes (and even impacts) can be short-lived and suffer fade-out effects, particularly if the initial intervention was relatively superficial. This can often be countered with the addition of longer-term incentives that assess outcomes a year or more after the intervention ends.
Avoiding Tunnel Vision: It is sometimes said that “what gets measured gets done” and “you get what you pay for.” If these aphorisms are true, they are doubly so for outcomes-based funding. Unfortunately, while such focus can be beneficial, it can also lead to a tunnel vision, with providers ignoring anything that does not directly contribute to the relevant outcome measures. Where it exists, this tendency can be checked by retaining other regulatory requirements that ensure continued attention to desired secondary outcomes. Additional performance metrics can also be added, although too many metrics can distract attention from the primary outcomes and create a system that is administratively unwieldy.
Continuous Improvement and Evaluation: Nearly every review of outcomes-based funding suggests that learning and continuous improvement are an expected component of design. Some of this learning comes from practical experience. Other learning, however, should come from rigorous impact evaluations, which alone are capable ensuring that seemingly well-designed performance standards are actually delivering on their promise.

Will these ideas work? Maybe. At the moment, more is known about what does not work than what does. Given the nascent state of evidence-based policy in general, that is not unusual, but it should change.

This general need for more rigorous research also contains the seeds of real synergy between outcomes-based funding and social impact bonds. While evaluations of outcomes-based funding mechanisms are relatively rare, they are common for SIBs.

As currently constructed, most SIBs are already laboratories of innovation. Most of the attention is given to the interventions they are testing. But if SIBs are destined to become more like traditional performance-based contracts, then they should also incorporate and test the outcomes-based metrics that may be their eventual path to scale.

Outcomes-based funding may yet hold significant potential for delivering improved performance and innovation. If so, SIBs may be the crucial ingredient that help it deliver not just outcomes, but true impact.

Related

Leap of Reason, SIBs: What’s Missing? (January 29, 2016)

Links

Categories