A new contracting language
Contracts across outsourced public services are generally evaluated on the basis of price and quality. This drives a limited and limiting approach from contractor and procurer. Adopting the language used to evaluate examinations could open up service design, facilitate more informed and holistic contract awarding, and enable ongoing service improvement – subsuming considerations of price and quality.
Test/examination strength is a consequence of: V + R + I.
‘V’ stands for ‘validity’. A test will be valid if it tests what it is supposed to test. It will have ‘face validity’ if it seems to the testee to be valid – if it looks like it is doing what it is supposed to. For example, does the history examination actually test your knowledge of history or your ability to spell well, or perhaps your skill at guessing accurately between multiple choice options? Would a swimming test be or feel valid if you just had to demonstrate your mastery of different strokes while lying on dry land?
‘R’ is ‘reliability’. Or fairness. If you take the same test on consecutive occasions (putting aside the practice you would have had), do you get the same result. If two people of equal ability take the test, do they score equally. For example, a driving test would not be reliable if three times as many people passed on sunny days as on wet days.
Validity and reliability were generally viewed at one time as the only two characteristics that testers should consider in their test design and evaluation. But this view was later enriched by the addition of ‘I’ for ‘impact’. One obvious impact of examination design and delivery is on the nature of teaching: if tests simply require the regurgitation of facts, then learning by rote is likely to dominate.
How then might V + R + I apply to contracts for public services and how would this new language remove current constraints?
Considering the validity of a contract would evaluate the extent to which the contract actually incentivized delivery of the desired service and, above all else, achievement of the desired outcome (for the individual and for society). For example, if further education is funded on the basis of the number of bums on the College seats, the number of hours they attend and the number of paper qualifications they pass, this may have very low validity – if the purpose of that further education is development of vocational skills to meet the needs of the labour market.
In fact, whenever payments are attached to inputs, the validity of that contract may be reduced by definition, because the service and its objective are pulled apart. Paying for inputs covers things like: hours of teaching, numbers of speed cameras, time spent with a patient, or beds occupied. Outputs often (though not always) have a slightly higher validity – certainly a higher face validity – and include things like the number of: Degrees awarded, speeding tickets issued or referrals to specialist interventions.
An outcome-based employment service such as the Work Programme, could be viewed as highly valid since a majority of payments are tied directly to the stated objective of the service. In the case of the Work Programme that means employment sustained over months if not years with a concomitant reduction in benefit spend and an increase in rates of employment.
It is, of course, not always possible to tie funding and objectives together in this way. However, where it is not possible, proposed inputs and outputs should have to demonstrate their validity in relation to the objective AND set out how this will be evident to the service user. This statement of validity would add value to both the service specification produced by the procurer/commissioner and then in the subsequent proposal submitted in the tender.
A contract will demonstrate reliability if it is fair. The way the standard of service is defined will obviously vary between contracts but could encompass: the nature of the service; its frequency; the duration of each interaction; the qualifications/skills of the staff; the follow up; the responsiveness to individual need; the facilities on offer; the opening hours; the accessibility of the service to local transport links; and the chances of achieving a satisfactory outcome.
The contract will be reliable if this same standard is available across time – it isn’t better in July than in December. It will be reliable if it is the same across geographies – rural and urban, North and South. And it will be reliable if it does not discriminate between irrelevant service user characteristics – age, disability, ethnicity, gender, sexual orientation, etc.
Many features of this reliability are captured under the usual application of ‘quality’, but it entails far more than that. At the level of the commissioner, consideration must be given to the reliability, for example, of using multiple as opposed to single contractors. Awarding contracts on the basis of price competition will reduce the overall reliability of the service, possibly even to the point of breaking equal opportunities legislation. Deploying different contractors in different regions may also reduce reliability, again exacerbated if price competition leads to a postcode lottery.
Maintaining high reliability will demand a clear statement by the contractor of the guaranteed standards, against which reliability can be measured. Where there are multiple contractors, these standards will have to couched in comparable terms. It is down to the procurer to determine these terms and also the baseline or framework for these standards, across all contractors, below which reliability might be at risk. Transparent performance data – tracking inputs, outputs and outcomes – can monitor ongoing reliability and indicate any contract failure.
The type of public services we choose to deliver, how we deliver them and how we fund them, often has an impact beyond that day-to-day delivery. This impact may or may not be intended. It may at times appear to be a consequence when there is no real causal link – but, nonetheless, the ripples must be considered. Or, indeed, the lack of desired ripples.
The Work Programme contracts were competed on price. I have argued elsewhere that this price competition means that the service is driven to over-prioritise. Those people who are furthest from work, with the most complex needs, will be ‘parked’. These are precisely the jobseekers who might otherwise have been referred to specialist subcontractors. The providers of such specialist services are often voluntary sector organisations with a strong local focus, such as LEAP in Harlesden. Charities like LEAP are threatened with closure – an impact of the Work Programme contracting.
The nature of our prisons is determined by the way we design and pay for that provision. Nearly 60% of people leaving prison will reoffend and be incarcerated again. An extreme thesis might suggest that the prisons themselves cause higher rates of recidivism. If this is not the case, it is at least true that our current prison regimes are ineffective in impacting on reoffending. Assuming we accept that this represents an extraordinary cost to our society, then the impact must be considered in prison commissioning/procurement.
A prison will be valid if it provides a secure, safe environment in which offenders are separated from society. Prisoners, their families, their victims and wider society will see it as such – it will have face validity. Its reliability means that it will be experienced in the same way by all prisoners of all characteristics/backgrounds, in all locations and over time. We may choose to determine and measure its impact in terms of the numbers of ex-prisoners who do not reoffend. We may also decide we need an impact on our state purse – ie that this prison must be delivered more cheaply than it was before.
It is crucial, however, that cost saving, or price, is viewed as an impact. It is one element of an equation that, as a whole, defines your service. It cannot be viewed in isolation of the validity or reliability of the contract, or, indeed, of other impacts.