Chapter 5: Creating and improving exams

Figure 11. Assessment cycle for courses with exams

Designing an assessment has four stages:

making a blueprint of the test (a schematic overview)
writing the test itself
writing an answer model
getting feedback on step 1, 2 and 3 from stakeholders

For exams and assignments, the process is very much alike:

Table 14. Comparison of assignments and exams

	Assignment	Exams
1. Blue print	Consistency check table: Rows: LOs Columns: deliverables Cells: criteria and weight	Assessment matrix: Rows: LOs Columns: levels of Bloom Cells: (sub)question number(s) and weight
2. Test	Assignment description, including knock-out criteria	Exam, including front page
3. Grading guide	Rubric (and/or assessment sheet): instruction for graders knock-out criteria	Answer model: model answers points to be awarded in each situation instruction for graders
4. Feedback from stakeholders	Experienced and new assessors, students	Experienced and new assessors

5.1. Exam blue print: assessment matrix

An assessment matrix is a blueprint to help you check whether your assessment covers the learning objectives you set and whether you test at the right level of thinking skills (the validity of your course). You can make an assessment matrix on course level and on test level. This document discusses how to make an assessment matrix for a single test. Assessment matrices can be used for exams that consist of individually graded questions, like written exams, oral exams, or practicals in which students have to answer a fixed set of questions (as opposed to writing a report). This document explains in detail how to make an assessment matrix.
The course you teach has a set of learning objectives or learning outcomes. This manual assumes is that you use Bloom’s taxonomy to define those learning objectives. If your programme uses another taxonomy, that is fine, too, of course. The learning objectives for a course could look like this for example:
On successful completion of the course, you will be able to:

List and define basic reliability, availability, maintainability and supportability (RAMS) concepts and measures.
Describe the main elements necessary to perform maintenance modelling and analysis for aerospace applications.
Identify common assumptions in maintenance modelling and analysis.
Select appropriate modelling and/or analysis techniques for given problems in the aerospace domain through analysis of problem characteristics.
Apply modelling and/or analysis techniques for given problems in the aerospace maintenance domain by:
1. Formulating and solving stochastic time-to-failure models to determine aircraft system and component reliability characteristics.
2. Formulating and solving time series techniques and stochastic demand arrival models to determine and predict aircraft system and component supportability characteristics.
Evaluate the benefits and drawbacks of available options for modelling and analysis of a given problem in the aerospace maintenance domain.

An objective specifies a topic or a bit of content (such as RAMS concepts, or stochastic time-to-failure models) as well as what the student should be able to do with that topic (list, describe, apply). The verb indicates the intended level of Bloom’s taxonomy that this objective aims at. In this example the first objective (list/define) is aimed at the bottom level (remember),

whereas the final objective is aimed at the highest level of evaluate.
To develop an assessment (an exam or an assignment) that is representative of these objectives, these two aspects, topic and level, both need to be taken into account. This is where the assessment matrix comes in. Basically, it is a table in which the two aspects of the objectives are related to the parts of the test, yielding a convenient overview of the composition of the test.
The matrix shows how the test is composed. What is the contribution of each objective towards the final mark? And to what extent are the different levels of Bloom’s taxonomy tested? This is convenient for the person creating the test (does it match my intentions?) and also a quick way of communicating the composition of your test to someone else.
An example of an existing exam whose assessment matrix was reverse engineered is given below. In the table, Q is the (sub)question number, and P is the points per (sub)question.

Table 15. Example assessment matrix for an existing exam based on the learning objectives listed previously. Q = (sub)question number, P = points per (sub)question. Dark blue cells indicate the level that the learning objectives in the left column describe.

Learning objective	Bloom’s cognitive levels								Total points (% of total score)
	Remember		Understand		Apply		Analyse
	Q	P	Q	P	Q	P	Q	P
1	1a 1b 3a	3 4 3							10
2	1c 4	5 5							10
3			1e 2a	3 5					8
4			1e 2b	5 5	2c	5	3	3	15
5a			1d	7	1e 4	5 5			17
5b			3	5	3	10			20
6							3 4	10 10	20
Total	20		30		25		25		100

By following the steps below, you will first design an assessment matrix which shows how you would like to construct the next exam. Then, you will analyse an existing exam and investigate to what extent it matches your “ideal” matrix.

Step 1: List the learning outcomes
Start by listing the learning outcomes in the left-hand column of the test matrix. If there is only one summative assessment, final exam, then all of the learning outcomes of the course need to be included. If the course is assessed in multiple ways (for example, a group-work project and an exam), then you need to select those learning outcomes that you want to test in the exam.

Step 2: Determine the weight of each learning outcome
Now that you have listed the learning outcomes that will be tested, the next step is to decide what weight you would like each learning outcome to have. In other words, what percentage of the total score should each learning outcome represent? Are they all equally important? Or do you want some outcomes to have more weight in the exam? Complete the final column of the matrix, by filling in the weighting of each learning outcome.

Step 3: Determine how each learning outcome will be tested
Now that you have decided the weighting of the learning outcomes, you can complete each row of the matrix by deciding at which cognitive levels you want to test each outcome. If formulated correctly, a learning outcome indicates what level of cognitive skill is intended. For example, suppose Outcome B in the matrix above is the learning outcome “Apply modelling and/or analysis techniques for given problems in the aerospace maintenance domain”. This outcome is at the level of application, and you have decided that it should count for 30% of the total score. What are your options for completing this row of the assessment matrix? You definitely need to allocate a proportion of the weight to the “application questions” cell, or you would not be testing this learning outcome properly. You cannot test at levels above the application level; that would not be fair. You could decide to only test this outcome at the application level and put 30% there. However, there are also good reasons for testing a learning outcome explicitly at the level or levels below it. One of them is that this gives you and the student feedback on what level of skill they have reached. Some students might answer the application level questions incorrectly, but have no difficulty with the comprehension questions that relate to the same learning outcome. Another reason may be that you want to build up the question in steps: first recall the facts required, then apply them to a new case. So, in this example you might decide to allocate 10% to comprehension questions, and 20% to application questions. Or 15%-15%. Or 5% reproduction, 5% comprehension and 15% application. Or some other combination – it is up to you.

Step 4: Check and adjust the totals for each level
After step 3, add up the percentages in each column to complete the totals in the bottom row. When you have done this, check whether you are happy with the result. You may find that you want to make some adjustments. For example, if in step 3 you allocated a percentage to the reproduction level for every learning outcome, you may now realise that the total for this column turns out higher than you would want. If you are happy with the totals in each column, then you are done with designing your assessment matrix. If not, then you need to adjust the cells, until you are happy. If you are designing a new exam, for a new or redesigned course, then the next step is to start constructing questions that match the matrix. If you have designed a matrix for an existing exam, it is interesting to check how well this exam matches the matrix that you have just constructed.

The assessment matrix will now look something like this (see Table 16):

Table 16. Example assessment matrix for a new exam. The dark blue cells indicate the level of Bloom that the learning objectives in the left column describe.

Learning objecive	Bloom’s cognitive levels						Percentage of total score
Learning objecive	Remember (recall basic information)	Understand (explain ideas and concepts)	Apply (apply information in a new way)	Analyse (distinguish components)	Evaluate (justify a stand or position)	Create (create a new product)	Percentage of total score
LO 1	5%	5%					10%
LO 2	5%	5%	20%				30%
LO 3		20%					20%
LO 4		5%	10%				15%
LO 5			25%				25%
Total	10%	35%	55%				100%

In this example, the number of questions in each cell has not yet been specified. This can be done while you are making the exam, or you can do it now.

You can delete columns you are not using for clarity.

To what extent does the existing exam match the blueprint that you have just constructed? To figure this out, go through the questions in the exam and for each (sub)question, decide in which cell of the matrix it belongs. This means that you need to decide which learning outcome it relates to and what the level of the question is in terms of Bloom’s taxonomy. Write down the question number and the number of points that can earned with this question in the appropriate cell. You can add this information to the matrix you have constructed, or you can complete a new one. Here is a template for an assessment matrix. When you have done this, you can add up the points, convert them into percentages and check to what extent the exam matches the new matrix. If there are differences, what are they? What are the main areas you would want to change (if any)?

Additionally, by adding an extra column to the table that includes the time the students spend in total on a particular learning objective, you can compare the percentage of points to the percentage of hours. ‘hours’ means 28 hours * the number of ECTS in your course, i.e. the total time students are supposed to spend on your course. Let us consider an extreme example where students spend 50% of their time practicing LO1, while they only receive 10% of the points on their final exam. If they performed very well during the course on this LO1, this will not have a big influence on their final grade. Furthermore, students might choose not to study LO1, since they will not get much points for this. Therefore, it is wise to align time spent and points given for a certain learning objective.

A few final words about assessment matrices:

An assessment matrix is useful because it provides an overview of the test. Many people find that when they fit an existing test (which they made without using a matrix) into a matrix that the result does not exactly match their intentions, especially with respect to the level of the questions. Often the test turns out to have more lower-level questions (especially reproduction level) than intended. At the same time, it is good to remember that the assessment matrix is an abstraction. It is only meaningful to the extent that the test actually matches the matrix. So making sure that you construct tasks (questions or assignments) that elicit the desired behaviour at the intended level of cognitive skill is paramount.

There are some rules of thumb to come up with the number of exam questions.

The number of questions per learning objective should represent the importance of the learning objective.
It can be better to have multiple small questions on a learning objective, than one big question. The reason is that you then have multiple ‘samples’ from a learning objective, instead of a single one. This will improve the reliability. On the other hand, in LO’s at higher Bloom levels, it might diminish the difficulty or even the Bloom levels, if you ask a couple of short questions, and one long question might be better for that learning objective.
The number of points on an exam question must be a good indication of the amount of time students will need to answer the question. Students will try to get the highest grade possible, and will skip questions if they are very difficult and will only result in few points.
Exam duration: there are some guidelines about how much time it will take a student to answer questions, but this differs quite a lot between type of questions. The best way to determine this is to ask a colleague who teaches a similar course.
Consider the total number of points in your exam and think about how much the grade will change in case a student misses a subquestion. Will her grade drop from a 10 to an 8? Is that desirable or is the drop to coarse? If not, add more questions in order to make the steps smaller.

There are rules of thumb for the number of closed questions you need to get a reliable exam. The ‘problem’ with closed questions is that students can guess a correct answer, without knowing the subject thoroughly enough.

The rules of thumb are:

Single, high stake exam, around 100% of the final grade
Required Cronbach’s alpha	0.8
Number of options	180
MCQ with 4 options	40 questions
MCQ with 3 options	53 questions
MCQ with 2 options / true-false questions	80 questions

Midterm, e.g. 40-50% of the final grade
Required Cronbach’s alpha	0.7
Number of options	120
MCQ with 4 options	30 questions
MCQ with 3 options	40 questions
MCQ with 2 options / true-false questions	60 questions

For a multiple-choice exam with 40 questions with 4 answers per question, students will only get higher than a 1.0 in case they have more than 10 questions correct. This is because students will on average (some are lucky, some are unlucky) be able to guess 10 questions correctly, without studying for the test. As a result of the guessing correction, the first 10 correctly answered questions will not increase the grade. For the other correctly answered questions, each of them increases the grade by 9/30 = .30.

For an exam with 40 true/false questions (2 answers per question), students will only get higher than a 1.0 in case they answered more than 20 answers correctly. Starting with the 21st correctly answered question, each correctly answered question increases the grade by 9/20 = .45. In case of 80 true/false questions, the precision would be .23.

Exam with open and closed questions
In case of an exam which is a combination of open and closed questions that count for less than 50% of the exam, make sure you have at least 80 options, in order to get relevant information from these questions.

5.2. Assessing exams: answer model and grading instructions

Before discussing the answer mode, you must realise that there is a difference between model answer and answer model. A model answer is the ideal answer, that you might want to publish for your students. The answer model is a tool that will help you and your fellow graders decide on how to add or subtract points for individual students in a consistent and objective way. It indicates how much points are awarded per correct step or correct part of the answer in case it is based on addition, and/or how many points are deducted for all expected if the answer model is based on deduction (subtraction).

An answer model can probably never cover all creative answers that students will come up with. Therefore, you also need an instruction for graders, that will tell the graders what to do in these cases. It is advisable to have a meeting in which you discuss difficulties in grading ‘creative’ or otherwise unexpected solutions, and adjust the answer model accordingly. This might lead to redoing the grading of some of the subquestions. In section Error! Reference source not found., issues that will diminish the objectivity of grading and hence the reliability of the assessment were described. An answer model enables you to assess the answer as objectively as possible to avoid those issues. The following table gives a checklist of what the answer model should contain:

Checklist 5. Checklists for answer models

Include the correct / an ideal answer

▢ Are all possible answers included?

▢ Are guidelines included on how many of these possible answers are required to earn points?

▢ Are instructions included on the process to handle correct student answers that are not (yet) included in the answer model?

Include the maximum number of points

▢ Are the max points included both for main and subquestions?

▢ Are the max points reasonable for the required amount of student work?

Description of how divergent answers are marked

▢ Is it clear which answers are considered fully/half/not correct?

▢ Is it clear how many points the various half-correct questions will receive?

Be clear on how interrelated subquestions are marked

▢ Is it clear for assessors how points can be earned for interrelated subquestions? If the first subquestion is incorrect, can students still earn points for follow-up questions based on the incorrect value?

Following the checklist when developing answer models can help you avoid potential disputes and increase the overall quality of the assessment. By developing the answer model at the same time as formulating the question, this can also serve as a check as to whether the phrasing of the question is specific enough. It is a tool that can help make the formulation of the question more pointed, so that the quality of the question is enhanced. If the answer model contains a large number of possible answers, this usually means the formulation of the question is not specific enough.

If there will be several assessors grading the same assessment, an answer model should include general rules for the assessment. Some of these were also mentioned in the previous section:

How to handle subquestions that are mutually dependent (scoring method)?

What to do when the given answer is not included in the answer model or when you are uncertain about the correctness of an answer, for example because the lecture about this topic was given by someone else?

Will you discuss this with your colleagues?
How will you add this to the answer model?
The instruction for graders might also describe which other measures you take to increase the reliability, for example to:
Assess the answers per question (instead of the full examination per student).
Change the sequence of the students per question.
Give the students anonymity by having them state only their student number on the answer sheets and not their names.
Use several assessors per question.
Divide the different questions over the different assessors, instead of dividing the students over the assessors. In this way, the assessor differences average each other out.
Grade the first couple of exams together and have a meeting in which you discuss differences between grades and adjust the answer model.

Although it might seem like a lot of extra work, investing time in this can greatly improve the quality of your assessment.

5.3. Checklist for exams

The most important hint is to write the exam questions together with the answer model, and use a colleague or other stakeholder to review them. Let your colleague check whether the question will probably lead to the answer in the answer model, or if the question needs clarification or whether additional instructions are needed.
Below, you will find checklists for the cover page of an exam, for writing exam questions and specific checklists for writing closed and open exam questions, that will help you to formulate and improve your questions and those of your colleagues.

Some faculties have a standard cover page which is used for all exams. If your faculty does not use one, you can make your own using this checklist. However, not all items might be useful to include in your exam.
Including a cover page may prevent unnecessary stress and loss of points for some students. They can check whether pages/questions are missing from their exam booklet, whether or not it makes sense for them to write essays that hopefully include the correct answer, or if there is anything that they might not be aware of that could diminish their grade.

Checklist 6. Checklist for exam cover pages

Are the following details included?

General information

▢ Number of pages

▢ Number of questions

▢ Duration of the exam/start and end time

▢ Course name

▢ Exam date and location

▢ Examiner’s name

▢ Name of the second reader/reviewer

Grade information

▢ Total number of points

▢ Exam grade calculation and/or cut-off point [minimum points to get a minimum pass grade (6.0)]

▢ In case the minimum grade for this exam is different, for example, 5.0, also mention the number of points needed for this minimum grade

▢ General rating information (if applicable), for example: if (and when) (minor) spelling and grammar mistakes will influence the grade

▢ how you will rate a question in case of multiple answers, which are (partly) incorrect

▢ how you will rate a question in case redundant information, which is (partly) incorrect

Instructions

▢ Resources allowed

▢ use of books, readers, notes, slides

▢ use of (graphic) calculator, mobile telephones, etc.

▢ Whether name, student number, and programme should be written on all sheets/pages that the student hands in

▢ Whether the number of sheets that the student hands in should be written down (and where)

▢ Any additional information, for example, if certain questions should be answered on separate sheets

▢ Whether students can take the questions, answer sheets or scrap paper with them

There will almost always be a trade-off between the quality requirements for assessment, but there are some basics that need to be in place, regardless. Furthermore, if you have your answer model ready, make sure that the questions will lead to the answer in the answer model. This sounds obvious, but it happens all too often because there is a misalignment between what the students should be able to answer/demonstrate, and that which the question requires them to answer/demonstrate. It is easier to pick up on this type of misalignment when you have a complete answer model.

Checklist 7. Checklist for validity, reliability and transparency

Test only one learning objective at a time (validity)

▢ Do not try to cover more than one learning objective in the same question.

Relevance of each question (validity)

▢ Is it clear what knowledge or skill is being tested?

▢ Is this knowledge or skill absolutely necessary in order to answer the question?

▢ Is the answer model in line with what the test questions ask?

Language (reliability)

▢ Are there any spelling errors or typos?

▢ Is the question unambiguous and is it clear what is being asked?

▢ Have double negatives been avoided? Is the question concisely formulated?

Presentation

▢ Is the layout clear?

▢ Are the figures clear?

Transparency of grading

▢ Before taking the test/assignment, do students know ahead of time what will be on the test both in structure and in content?

▢ Before taking the test/assignment, did your students get experience with the types of questions with which you will be testing?

▢ During the test/assignment, are the points to be earned by each question or subquestion announced? This way students can budget their time to be most impactful for them. They should not spend a lot of time on a question that will not earn them a lot of points.

▢ After getting the grade and feedback, does the student get information on how her grade has been calculated, and on how she can improve her performance, for example per learning objective, criterion or subquestion?

Closed test questions can be true/false questions, multiple choice questions, 'fill in the blanks' and pairing questions.

Checklist 8. Checklist for closed-ended test questions

Dos

▢ Do all questions end in a question mark? Students should be able to answer the question without looking at the answer.

▢ Do all distractors seem just as plausible as the correct answer?

▢ Are all options are roughly of the same length?

▢ Are the right answers distributed randomly over A, B, C, D, etc.?

Don’ts

▢ Does no question inadvertently provides the answer to another question?

▢ Are there no grammatical clues to indicate the right answer?

▢ Are there no questions that start with ‘Which of the following statements are true/false?’

Asking a question like ‘Which of the following statements are true/false?’ could potentially test more than one thing at a time. If it were an open question, you would have asked and graded the answers to each statement separately with partial points. All distractors should be equally probable. Constructing the distractors will be a time-consuming process. It is better to have more questions with less distractors than having ones that are not probable. As a guideline, use 3 options (i.e. 1 correct answer and 2 distractors). When constructing them, think of the answers that weak students would give if it were an open question.

Open-ended questions are any questions where the student has to write a free-form answers. The answers can consist of single words, phrases, bullet points, a few sentences or even an entire report.

Checklist 9. Checklist for open-ended test questions

Use a 3-part Structure

▢ Context (optional)

▢ Question (assignment)

▢ Directions for answering, for example, ‘Motivate your answer, showing which formulas you used. Write no more than 3 sentences’.

Be specific

▢ Use imperative sentences (“List three characteristics of X” rather than “What are the characteristics of X”).

▢ Specify what you expect in the answer (e.g. “List the three characteristics of X”).

▢ Avoid “anything goes” formulations such as “What do you think…”

Context and question

▢ Make sure the context is relevant for the question. If not, delete it. Item Details to include should be linked

▢ If the question can be answered without using the context, then change/remove the context OR change the question. Unless a learning objective is to filter out irrelevant information, of course.

Check for copy/paste errors

▢ For example, between old and new questions

Interrelated subquestions

▢ Can a student continue calculating with an imagined set of numbers if a first subquestion was answered incorrect?

▢ If so, are students instructed on what value(s) to use?

Make sure to have a rubric or answer sheet for grading open-ended questions. This will also help you to keep your assessment aligned with your LOs.

References

For a list of references used in creating this manual please visit this page.

Anchor Links Example

<< Assessment

TU Delft assessment manual

Chapter 5: Creating and improving exams

5.1. Exam blue print: assessment matrix

5.2. Assessing exams: answer model and grading instructions

5.3. Checklist for exams

References