2. Develop the evaluation brief

The evaluation brief is a document that is used to gain agreement on an evaluation and develop a Request for Tender (RFT) to commission an external evaluation or to develop agreements for an internal evaluation.

It is the same concept as the program evaluation plan referred to in the NSW Government Evaluation Guidelines (PDF, 543 KB) which may be developed during the program design phase. The details and scope will differ from evaluation to evaluation. It may also be called the terms of reference for the evaluation. The brief is the basis for developing the evaluation design.

A brief for a program evaluation sets out:

  • Purpose of evaluation — formative or summative
  • Type of evaluation needed — process, outcome and/or economic
  • Scope and focus of the evaluation
  • Key stakeholders
  • Key evaluation questions
  • What is already known about the program?
  • Reporting and communication
  • Decide on balance of internal and/or external evaluation
  • Develop an evaluation strategy (for large programs)
  • The investment in the evaluation
  • Governance mechanisms and stakeholders engagement strategy

Considerations for the evaluation brief include:

  • A program with external funding may have specific requirements in terms of the evaluation focus, methods, timing and scale.
  • High-profile programs or those with significant risks may need more extensive evaluation that will provide early warning of any problems.
  • Pilot initiatives are likely to need more extensive evaluation to provide information not just on whether they work, but how they work, so they can be replicated or used to scale-up.
  • A program with multiple stakeholders may need more resourcing to support their involvement in negotiating the evaluation focus and methods and communicating findings.

For larger evaluations, you may also progress the development of the evaluation design at this stage, so that an outline of the evaluation design can be included in the brief. This is an important consideration when the brief for evaluation plan is developed during the program design phase.

Purpose of evaluation – formative or summative

The starting question for planning a program evaluation is "Why do this evaluation?"

The two main evaluation purposes

  1. Formative evaluation for program improvement, learning and decisions about incremental changes.
  2. Summative evaluation for accountability and decisions about whether or not to continue or expand a program.

Formative and summative evaluations may use some of the same evaluation methods.

The classic comparison, by Professor Robert Stake, is "When the cook tastes the soup, that's formative; when the customer tastes it, that's summative".

Formative evaluation refers to evaluation conducted to inform decisions about improvement.  It can provide information on how the program might be developed (for new programs) or improved (for both new and existing programs). It is often done during program implementation to inform ongoing improvement, usually for an internal audience. Formative evaluations use process evaluation but can also include outcome evaluation, particularly to assess interim outcomes.

Summative evaluation refers to evaluation to inform decisions about continuing, terminating or expanding a program.  It is often conducted after a program is completed (or well underway) to present an assessment to an external audience.  Although summative evaluation generally reports when the program has been running long enough to produce results, it should be initiated during the program design phase. Summative evaluations often use outcome evaluation and economic evaluation but could use process evaluation, especially where there are concerns or risks around program processes.

The purpose of a program evaluation will inform (and be informed by) the audience needs, reporting requirements and intended users and uses. It will also be shaped by program characteristics including

  • significance to government, size of investment, risks, sensitivities and needs for decision
  • the stage and maturity of program implementation
  • the readiness of the program for evaluation including the extent and quality of administrative data.

In some cases, evaluations are required by legislation or policy. Each cluster within NSW Government will have a rolling 12 month evaluation schedule, which must be prepared and submitted to ERC for approval, beginning in the 2013-2014 financial year. Schedules should include:

  • A list of programs planned for evaluation and review and their expected completion date
  • Who will evaluate or review listed programs
  • The governance processes for the schedule, including internal monitoring and reporting
  • When the schedule will be reviewed and updated.

Type of evaluation needed – process, outcome and/or economic

The most common types of program evaluation within government are process evaluation, outcome evaluation and economic evaluation.  Process evaluation is mainly but not solely used for formative purposes, and both outcome evaluation and economic evaluation used mainly for summative purposes.

Other evaluation tools (needs assessment, program logic, evaluability assessment) may be used in preparing a program evaluation brief (see below) or to inform program planning.

Types of evaluation

Type Focus
Process evaluation  Investigates how the program is delivered, including efficiency, quality and customer satisfaction. May consider alternative delivery procedures. It can help to differentiate ineffective programs from failures of implementation. As an ongoing evaluative strategy, it can be used to continually improve programs by informing adjustments to delivery.
Outcome evaluation (or impact evaluation)  Determines whether the program caused demonstrable effects on specifically defined target outcomes. Identifies for whom, in what ways and in what circumstances the outcomes were achieved. Identifies unintended impacts (positive and negative). Examines the ways the program contributed to the outcomes, and the influence of other factors.
Economic evaluation  Addresses questions of efficiency by standardising outcomes in terms of their dollar value to answer questions of value for money, cost-effectiveness and cost-benefit. These types of analyses can also be used in formative stages to compare different options.
Needs assessment  As part of program planning, assesses the level of need in the community, what might work to meet the need. For an existing program, assesses who needs the program, and how great the need is. 
Program logic  Used for program planning and for framing a program evaluation to ensure there is a clear picture of how and why the program will produce the expected outcomes. 
Evaluability assessment  Used in developing a program evaluation brief to determine whether a program evaluation is feasible and how stakeholders can help shape its usefulness. This is useful if implementation has commenced without an evaluation plan. 

Scope and focus of the evaluation

All program evaluations should be as rigorous as possible, aim to produce valid and reliable findings, and reach sound conclusions.

The evaluation brief needs to consider an evaluation design that addresses rigour, utility, feasibility and ethical safeguards.

Whenever feasible and appropriate, program evaluation should aim to measure program outcomes. Planning for rigorous outcome evaluations should begin as early as possible to allow for a strong evaluation design that can have comparison groups for quasi-experimental or experimental evaluation approaches, and arrangements for collecting the required program data.

It is never feasible or appropriate to try to evaluate every aspect of a program. Any evaluation project needs boundaries in its scope and a focus on key issues. For example, a program evaluation might look at how a program has been implemented in the past 3 years, rather than since it began, or could look at its performance in particular regions or sites rather than across the whole state.  An outcome evaluation may focus on outcomes at particular levels of the program logic or for particular components of the program. A process evaluation may focus on the activities of particular stakeholders, such as frontline staff, or interagency coordination.

Key stakeholders

Key stakeholders are likely to include senior management in the agency, the Strategic Centre, program managers, program partners, service providers, and peak interest groups (representing industries, program beneficiaries and so on).

In developing the evaluation brief you should consider the questions that significant stakeholders will have of the program, when they need answers, and how they will use this information. One method is to map significant stakeholders and their actual or likely questions.

Stakeholders will also have expectations about the most credible evidence to answer those questions. They will have different degrees of knowledge of the program, of the extent that it can be evaluated, and the suitability of different evaluation designs and methods. You need to be clear on their interests and understanding of the program, and decide how these should be reflected in the evaluation, and how their expectations can be managed throughout the evaluation.

Key evaluation questions

A program evaluation should focus on only a small set of key questions.  These are not questions that are asked in an interview or questionnaire but high level research questions that will be answered by combining data from several sources. 

Key evaluation questions for the three main types of evaluation

Type Typical key evaluation questions
Process evaluation How is the program being implemented?
How appropriate are the processes compared with quality standards?
Is the program being implemented correctly?
Are participants being reached as intended? How satisfied are program clients?
For which clients?
What has been done in an innovative way?
Outcome evaluation (or impact evaluation) How well did the program work?
Did the program produce the intended outcomes in the short, medium and long term?
For whom, in what ways and in what circumstances?
What unintended outcomes (positive and negative) were produced?
To what extent can changes be attributed to the program?
What were the particular features of the program and context that made a difference?
What was the influence of other factors?
Economic evaluation (cost-effectiveness analysis and cost-benefit analysis) What is the most cost-effective option?
Has the intervention been cost-effective (compared to alternatives)?
Is the program the best use of resources?
What has been the ratio of costs to benefits?

 Appropriateness, effectiveness and efficiency

In this Toolkit, we use three broad categories of key evaluation questions to assess whether the program is appropriate, effective and efficient.

Organising key evaluation questions under these categories, allows an assessment of the degree to which a particular program in particular circumstances is appropriate, effective and efficient. Suitable questions under these categories will vary with the different types of evaluation (process, outcome or economic). 

Typical key evaluation questions
Appropriateness To what extent does the program address an identified need?
How well does the program align with government and agency priorities?
Does the program represent a legitimate role for government?
Effectiveness To what extent is the program achieving the intended outcomes, in the short, medium and long term?
To what extent is the program producing worthwhile results (outputs, outcomes) and/or meeting each of its objectives.
Efficiency Do the outcomes of the program represent value for money?
To what extent is the relationship between inputs and outputs timely, cost-effective and to expected standards?

While you can use different processes to develop evaluation questions, these should emerge as you consider the different activities associated with this step (the purpose of the evaluation, the type of evaluation, stakeholder interests, and preliminary assessments).  There may be formal and at times general evaluation questions that need to be addressed where evaluations are mandated in legislation or arrangements such as National Partnership Agreements.

To clarify the purpose and objectives of an evaluation, there should be a limited number of higher order evaluation questions (roughly 3 to 5 questions) with sub-questions underneath each higher-order question. The higher-order questions can be grouped under the categories of appropriateness, effectiveness and efficiency.

A way to test the validity and scope of evaluation questions is to ask: when the evaluation has answered these questions, have we met the full purpose of the evaluation?

What is already known about the program?

You can prepare for a program evaluation by conducting preliminary investigations about the program and the scope for evaluation.  While in some cases, an evaluation is required irrespective of the state of the program, in other cases work can be done to make the program more able to be evaluated, or alternatively to demonstrate that it is not worth evaluating. Three methods to prepare for an evaluation and inform an evaluation brief are:

  • Review program logic
  • Use evaluability assessment to assess readiness for evaluation
  • Identify what is already known about the program

Review the program logic

Reviewing or developing the program logic is an important prelude to an evaluation. It should provide a useful description of the program and its intended outcomes that will help shape the evaluation questions and data collection methods.

Key evaluation questions for program logic analysis include:

  • What is the problem the program is trying to solve or outcomes it is trying to achieve?
  • How plausible are the program activities to achieve the outcomes?
  • How appropriate is the program in relation to government policy?

Program logic can also be used to assess whether the program is still appropriate, and if not, provide a basis for discontinuation without the need for further evaluation. For example, program logic analysis can show whether the intended outcomes are still appropriate and link to government priorities. Program logic can also determine whether the program activities and immediate outcomes can plausibly be linked to the intended outcomes, either logically or using evidence from the research literature.

Use evaluability assessment to assess readiness for evaluation

Evaluability assessment is used to determine whether and what form of a program evaluation is feasible. It will also help identify what will make a program more able to be evaluated, such as refining the program logic, or improving the collection of monitoring data.

An evaluability assessment is particularly important if implementation has commenced without an evaluation plan. For example, an evaluability assessment may find that no data on program outcomes is being collected, pointing to data collection design work that is needed prior to conducting an outcome evaluation. The findings from an evaluability assessment should inform the design Step 4. Manage development of the evaluation design and feasibility of the program evaluation

Questions for evaluability assessment include:

  • Does the program have a plausible program logic?
  • Is there a clear purpose and objectives for the evaluation?
  • Can you clearly identify an audience for the evaluation and how the findings will be used?
  • Are there sufficient resources to conduct an evaluation? Is there suitable data from program implementation and/or monitoring or is it possible to collect data?
  • Can a comparison group be identified to better determine program impacts and outcomes?

Identify what is already known that is relevant to answering key evaluation questions

It is a waste of effort to conduct an evaluation when answers can be extracted from existing data. Before considering program evaluation, analyse performance monitoring data, and scan for evidence about comparable programs.

An analysis of available program monitoring data should reveal trends, patterns and issues with program implementation, and in some cases program outcomes. This analysis can answer some questions about the program, and point to other questions that the program evaluation should address.

A scan for existing evidence about effectiveness of comparable programs in other jurisdictions or internationally can point to expected outcomes, standards and issues. These can also inform the development of evaluation questions, the evaluation design, methods of data collection, and standards for assessing performance.

Reporting and communication

Evaluation reports are usually the most significant product of a program evaluation project. The final report, either in full or summary form, needs to reach the intended audiences through formats and channels that are meaningful to them. You need to consider which stakeholders will be audience for the evaluation report or reports, and how they might use those reports. Evaluations may be designed to inform decisions in the budget and policy cycle, meaning that reports are required at specific times.

Paying attention to reporting needs when developing an evaluation brief can help clarify expectations about when information from a program evaluation is needed, and the time it will take for the evaluation to produce reliable and robust findings. In some cases, interim evaluation reports can be timetabled to provide preliminary findings to decision makers.

The planning stages of a program evaluation should consider the practice principle Evaluation processes should be transparent and open to scrutiny in the NSW Government Evaluation Guidelines. You should consider how the evaluation findings, methods and data might be shared within government.  An evaluation report on any program delivering services to the public should be publicly released in a timely manner, except where there is an overriding public interest against disclosure.

During this step, you should set out the reporting requirements   which will be further developed in the workplan.

  • Identify suitable reporting for key audiences.
  • Consider issues of length, structure, style, and whether to publish – particularly if tendering services from an external agency.
  • Develop timeframe for reporting to meet evaluation purposes.

Reporting and communication about the evaluation can have an important influence on how the findings will be used.

Decide on balance of internal and/or external evaluation

One of the principles in the NSW Government Evaluation Guidelines is that Evaluations should be conducted with the right mix of expertise and independence. In deciding who conducts the evaluation, issues to consider are knowledge of the program or policy, expertise of the evaluators in program evaluation, perceived and actual independence of the program, and credibility of the evaluator in the eyes of the intended audience.

For many evaluation projects, a partnership between the internal managers and the external evaluators may be effective and provide good value for money.  The degree of partnership will vary depending upon capacity, logistics and the need for independence.  The internal team is often best suited to manage the overall process and governance arrangements, and also to provide data from administrative systems and coordinate intra-government arrangements such as organising stakeholder interviews. A well-managed partnership approach can bring flexibility to the evaluation, reduce delays, be cost-effective and promote learning about evaluation within program management.

Some possible scenarios for internal and/or external evaluation:

Evaluation all done internally An evaluation can be designed and managed internally where the program is a small to moderate investment and a low risk (tier 1 or tier 2 in the NSW Government Evaluation Guidelines), the evaluation is limited in scale, and internal staff have skills and resources for systematic data collection and analysis.
Hybrid — combination of internal and external External service providers can contribute to hybrid evaluations in different ways:
  • Supporting internal staff to conduct an evaluation through facilitation and/or coaching.
  • Undertaking one or more components (e.g. specialist data collection or analysis, or reporting).
  • Providing an external review of a process or product (e.g. evaluation design, data collection instruments, evaluation report).
External — smaller scale evaluation project, designed internally For some small evaluations, or where an evaluation is repeating a previous design, it can be designed internally and then an RFT will be used to engage an external group to implement it.
External — larger scale evaluation project, designed by external evaluations In many cases external expertise will be useful to design the evaluation. This can either be done as part of proposals for the evaluation, or as a separate project (if the evaluation is very large and complicated).

Develop an evaluation strategy (for large programs)

The scale of the evaluation should be proportionate to the size or significance of a program, as set out in the NSW Government Evaluation Guidelines. For large programs this may involve a series of evaluation projects and related activities. This can be the case for programs that are large-scale (significant investment, extended reach), have three or more year duration, and/or are complicated (multiple sub programs, across agencies or whole of government).

Such programs may warrant an evaluation framework and strategy that sets out a series of evaluation projects and activities for data development and evaluation capacity building over the period of the program. This will allow you to build in process, outcome and economic evaluations at key times that match the developing maturity of the program and meet the needs for information for formative and summative purposes.

The evaluation framework and strategy can be developed at the time of the program design and reviewed at milestones, such as after the delivery of each evaluation report.

The investment in the evaluation

Like any project, the evaluation requires an investment of financial and staffing resources commensurate with the scale of the program and the evaluation. In the program design stage, a proportion of budget (and/or internal staff time) should be allocated to cover evaluation activities.

The cost of an evaluation project will be shaped by the scope of the evaluation activities, whether they are to be carried out internally or by external consultants, and the extensiveness of additional data collection, analysis and report writing.

While the detailed tasks and scope of an evaluation project will not be clear until the design step, allocating a budget for an evaluation project and commissioning an evaluation will indicate the extent and depth of work available for that budget.

Governance mechanisms and stakeholders engagement strategy

Ideally, a governance mechanism such as a steering committee or advisory group will be established to provide direction or advice at various stages of the evaluation. The benefits include a greater range of perspectives and expertise, as well as greater ownership of the evaluation process by key stakeholders.

You should consider a governance group that matches the purpose and the scale of the evaluation. The group may be entirely within government, or include government and external stakeholders.

In most cases members should be beyond just the program, and include relevant people from elsewhere in the agency, or from partner agencies. For significant evaluations (tier 3 or 4), you should consider a representative from the Centre for Program Evaluation.  External stakeholders can include key academics who research the program area, representatives of peak groups for program clients, industry bodies, and program service providers.

Product
Evaluation brief with purpose, scope, key evaluation questions, governance arrangements budget and timelines.