Higher Education

Ethical AI Use in Assessments: Leverage, Don’t Fear, the Machine

Discover techniques that both leverage AI tools and ensure they are used responsibly

Generative AI can empower students to be more creative, more expressive, and more cogent.  It creates efficiencies for instructors, supports neurodivergent students’ needs, and allows for more valid assessment. But it can be misused to create unreliable assessments. Attend this session about AI and leave with techniques that both leverage AI tools and diminish the chance students use AI tools inappropriately.



As a National Board-certified high school teacher, Joe Kennedy taught in three states and three departments, served as a Technology Leader, Assessment Coach, curriculum writer, and sponsored the LumberJack (and Jill) Club. At the post-secondary level, Joe is an intermittent adjunct faculty member and full-time Instructional Designer, working to help 200 faculty members become even better through the appropriate and effective inclusion of technology tools, and serving as a co-lead of Concordia’s AI response team. His current research focus is on the LMS as the inadvertent driver of pedagogy and andragogy. Joe has been a GoReact user and administrator since 2013.


Joseph Kennedy:

So I don’t know how many of you like to watch the Marvel Cinematic universe, but this is Ultron. Ultron is an artificially intelligent construct created by Tony Stark and Tony Stark created Ultron with lots of hopes and goals, much like Sam Altman, Suleiman and others in the AI industry have created artificial intelligence. But within 10 minutes of Ultron’s creation and access to the internet, he decided that humanity should be wiped out. So that is the fear that people have. However, AI doesn’t always get it right.

Well, back in May of 2023, I asked an AI tool, DALL-E, to show a group of people excited about being together for training, and that’s what DALL-E generated. It’s probably good that AI doesn’t always get it right because recently a journalist who was interacting with ChatGPT found out that ChatGPT sees itself as Ultron. So there are people who fear it, but I’m not one of those people who fear AI. I believe we can leverage AI and we can leverage AI to be better educators. Today I’m specifically talking about assessment and how we can leverage AI for assessment because generative AI can empower students in multiple ways, and generative AI can help the whole educational process in multiple ways, but it can be misused. And it’s not just that AI can be misused by students, AI can be misused by instructors. We can accidentally create unreliable assessments, invalid assessments just as much as our students can use AI to game assessments.

So acknowledging all of that, I’m hoping you leave this session believing that AI is present and it’s powerful. It is not a flash in the pan. It’s not something we’re going to spend a lot of time worrying about and then forget about in two years. And it is not inconsequential. It is indeed a powerful learning tool. I’d like you to leave this session with some techniques to leverage AI to improve assessment and some tools to diminish inappropriate student AI use. And Jessica introduced me in terms of establishing ethos, like to let you know, one of the things I do along with Dr. Darin Ulness is co-lead the AI response groups at Concordia College. We were one of the first colleges to establish an actual policy about AI as well as guidance about AI, and we’re working from the bottom up to do that.

My primary role as an instructional designer and academic technologist is to help people use technology better. So this falls in my daily wick. In terms of goReact, we’re going to talk about some specific ways this particular tool is useful when dealing with AI. I’ve been a goReact user and administrator for a decade. I teach graduate and undergraduate classes about once every other semester as well as serving as the instructional designer. And my research foci fall right into what AI lands on.

We’d like to think AI will empower us. Just so you know, this is art created by Imagine, the Imagine.art AI with an input of a human speaking and output of lightning bolts, and this is what it thought I wanted. I’m going to have several AI pictures in here. It’s one of the ways to show off what AI does. Some of the next slides may be information you already know. If so, please accept my apologies. I’m not trying to talk down to you. I just don’t know what everybody’s base level understanding of generative AI is. And the slides that I present are going to talk about the way AI functions in a matter that’s germane to understanding how best to use it to better assess our students.

We don’t think we’re ready for AI. And by we, I mean educators and higher education in particular. So this is data that is pulled from over 1,200 institutions when Cengage last fall put out a survey. You can see the results right there. What it really boils down to is nobody really thinks the institution is prepared for AI-related changes. And that is in part because it moves so fast. And that is in part because it initially seemed to come out of nowhere. We talked about AI. We talked about AI. There were lots of people who were saying it would be ready available in five or 10 years. And then all of a sudden, less than a year and a half ago OpenAI said, “Boom. Now everybody has access to AI as long as they have access to the internet.”

So it is different than most technology changes because they were phased in. Graphing calculators, which were a huge disruption in math and science took a decade from the point where we really had to start dealing with it to the point where it was affordable enough, many students could have one with them. And everything that’s going on with AI has happened. Not only has it happened so quickly, it’s happening in the context of some decisions we’ve made where we have seeded power to ed tech tools without always thinking about. Your Blackboard or your Sakai, your Brightspace, your Moodle, your learning management system. The decisions that are made about how to implement impact every instructor’s pedagogy and assessment choices, sometimes at a minor level and sometimes profoundly. And we as institutions haven’t fully dealt with that already, and along comes AI.

If you didn’t already know, you should know that AI is viewed as a black box. The people who invented AI and the people who maintain it say week after week in the popular media and in the academic press, in peer reviewed journals and on your local news station, that they themselves don’t know how some of the decisions that AI makes are being made. So that is a little hard to wrap our heads around. It’s not like AI is a person in a box, but it’s also not completely random because it is a black box, we have to teach students how to prompt and we have to teach students how to guide its progress, and we have to teach students how to contextualize its responses. And that’s the key to using AI tools for better assessment is we make sure that we and our students know the right questions to ask, how to move it along and then how to contextualize its responses.

So a less than one minute overview of what generative AI is. So when I’m talking about generative AI, think Claude, think ChatGPT, think Grok. Any of the tools that we’re seeing right now that are readily available fall into generative AI. They’re based upon large language models. They predict things, they don’t know things, they don’t process. So in particular, they can’t do higher math very well. They can’t even necessarily multiply two large numbers, ask ChatGPT to multiply a nine digit number by a nine digit number and it can’t do it because it doesn’t process. It’s just predicted.

The models are pre-trained by reading billions of data, but they’re not authoritative. So if you ask ChatGPT who the first President of the United States was, that AI doesn’t know in a sense that we think of knowing that George Washington was the first President of the United States. What it does know is that the billions of data in its training corpus first president, starting president, initial president number one United States, George Washington, appear next to each other anytime the subject is raised except very few. So it is certain that if it plays the odds and says, George Washington is the first President of the United States that that’s the output the human is looking for. But if we could somehow go into that corpus of training data and say that Samuel Adams was the first President of the United States enough times ChatGPT might start telling us that’s what the fact is. So these models are pre-tech trained, but they’re not authority.

They generate content, rather than just identifying or classifying information. They don’t just replicate what’s already out there. And that’s the big change. That’s what makes people view these tools almost like they’re human. These models transform text input into symbolic representations. Those symbolic representations have meaning to the algorithms, but these models cannot make sense of the input intuitively. So for example, if you ask a generative AI photo tool to create a picture of a cave, the tool has no concept of what a cave is, but it does know that there are certain differences in pixelation from this position in a picture to this one that show up every time there’s a cave. And then it tries to generate something based upon that knowledge.

All of this is what’s going on behind the scenes, but that is not at all what most users think. Most users think they can ask a generative AI tool for information and they will get an authoritative answer. So in this sense, AI is no different than the internet when our students rely on the internet for an answer. And teaching our students how to be critical thinkers is key. Generative AI apps can do some things that are unhelpful. And the first is they don’t always follow the user’s explicit instructions and they can give hallucinations. They’re not always easy to interpret. The humans don’t know how the model arrived at a decision, and a model can include toxic or biased content.

So a very quick example of that, the algorithms that are used in airports when they use millimeter wave detection as you pass through them, those algorithms are based upon representations of humans. But the people who set all of the algorithms up and fed the data to this algorithm to begin with tended to be men, and they tended to be white men or men of South Asian descent. So Black women who went through these machines in the first iteration almost always got flagged and it was almost always because of their hair, because all of the data that had been fed to these machines reflected what the lived experiences of the programmers, and it did not include a lot of hair that is different than white men and South Asian men have. It’s a very crude example and it’s a lot more nuanced how toxic and biased information can sink in, but it can. So unsurprisingly, people want guidance.

So this data is from a survey at my college Concordia College, but it turned out that 20% of the respondents students, faculty and staff, even though it was an anonymous survey, said, “Please contact me for individual follow up.” And then they put the link where they could send us their contact information. And in terms of surveys, that’s rare to get that many people in an anonymous survey who want to be contacted. Among all of our faculty, the most common response from two-thirds of them is we need more guidance. It was that clear. And when we talked to the students specifically about whether AI is being used ethically, you can see that it was only roughly a third of our students felt that other students were using it ethically. And when we went deep into that data was because so many students said they don’t know. They don’t know what the parameters are. They might not even use AI because they’re afraid of being accused of cheating. But students do expect a lot of AI.

You can see this is a poll from 2023 by the company Anthology. The link is in the PDF if you’d like to follow it, but it’s a fairly large sample size given the number of different countries and institutions. And you can see students expect that AI can do a lot of things. And in this particular case, their expectations are not divorced from reality. We as educators can use AI to do all of these things, but it’s going to be hard because the faculty aren’t receiving training. 71% of K-12 teachers said no training has been provided to them. And as a higher ed staff member responsible for training faculty, I can assure you the numbers are roughly the same. It’s going to vary from institution to institution, but it’s new. It can be overwhelming. We’re coming off the heels of a pandemic and a lot of faculty members are just recovering from that exhaustion and it’s unfair, but we do have to train ourselves in each.

In the United States, higher ed leaders see a need for more training. It’s from a June 2023 Ellucian survey worldwide. The survey from Chegg, third of students want clear guidelines knowing when it’s acceptable to use genAI. And we’ll leave the jokes aside about the irony that students who went to Chegg are saying they want better guidelines about ethical use of AI. In fact, that might make that number even more. So there are three ways to focus on student needs. We can teach students how to use genAI tools and we should. We teach students how to interpret the results, but today we’re talking about what’s the appropriate use of AI in assessments. But first, we should acknowledge that only some of the frustration with AI and how it impacts assessments and how students might use it to unethically respond to assessments is about the AI itself.

At the higher education level, we’ve not done a very good job of making sure that faculty are well-trained in how to match particular assessment types, mechanisms and tools to their learning goals for the students. So there are a lot of poor assessment practices that have occurred at the higher education level because the system itself is not encouraging faculty members to get better, and that comes in as well. genAI has brought this into clarity and 65% of students in the Chegg survey have said, genAI means we need to change the way we assess students. So even if it’s not genAI, that is the problem all the time, genAI has brought the problem to the field.

So a tool we are using at my college that we believe is going to become systemic, every department adopting it next year. And I say believe because it’s still going through the process, although many departments and instructors are already using it, is the Artificial Intelligence Assessment Scale. This was developed by a group of faculty members in Southeast, Asia and Australia. This particular instrument is adapted and used with permission from professors, Perkins, Furze, Rowe, and McLaugh. It is available and they would be happy for other people to adapt it if they want. The link is in the PDF. The AI Assessment Scale says, let’s tell students exactly what level of AI involvement is acceptable on each assessment. Starting from a level one, where no AI is used at all knowingly, and going up to level five, which is where essentially you and an AI tool are co-creating content.

At level two, this is where students are encouraged to use an AI tool to help them with structure, idea generation, brainstorming. At level three, the AI is used to help them edit. At both levels two and three, the best use of this instrument includes debriefing with students. One-on-one debriefs group discussions, reflective writing, all sorts of assessment mechanisms. But the idea is to get students to identify what they did and how they use this tool to make it better. At level four, the student is going to explicitly use AI to do a lot of the work for them, but then they’re going to evaluate what the AI did. So as you can see, in addition to moving up in terms of how much work an AI does, this scale also moves its way up instruments like Bloom’s Taxonomy. And then at the fifth level, the student is using the AI somewhat as a copilot.

Let’s move into some examples of how this is used. So we have a religion professor at Concordia who dove right into this this semester and was surprised that it only took her an hour and a half across all three classes to revise the assignment directions, to revise the syllabi and class time to explain to students about the acceptable use of generative AI. According to this professor, the quality of papers that she has received this semester are clearly better than the average papers. And unlike the fall semester, she hasn’t received any papers that make her think a student used AI to write the whole thing, and she was getting some of those last semester. And actually right before this presentation began, a colleague of mine sent an email saying, “Help. I think I got my first genAI paper.” This religion professor has found a way working at levels two and three on the AIAS to avoid that problem.

The original authors also present some extra ways, examples from their own universities of how the AIAS was used successfully. So here’s business management at multiple levels. Here are level three examples. This is where the students are specifically using the AI tool to help them refine what they’ve already done. So there are ways to use the AIAS just to communicate directly to students what we expect of them. And at the beginning I said, I hope that you come away with some techniques to leverage the tool. Here we’re leveraging AI tools to help students at their weak points. Students who have a hard time starting out, they get writer’s block, they don’t know how to brainstorm, show them how to use AI at the level two. Students who have great ideas and really struggle to present those ideas in written form in a way that is deemed as being professional or academic.

At my institution, we have numerous international students who speak 5, 6, 7 languages fluently, but English is the fourth or fifth language they’ve learned. AI can help those students show how much they know and how much they can do without letting the fact that English is their fifth language be the thing that prevents us from knowing. Also, these clear guidelines clearly diminish inappropriate behavior. Every institution that has set forth clear guidelines, this tool or a different one has reported that the faculty are reporting fewer and fewer suspected violations of academic integrity due to AI. So just using a clear communication tool and talking to students about how to use AI leverages this technology to make assessment better, more valid and more reliable and diminishes inappropriate use. But we can also move on to come up with some methods of assessing students that are harder for AI to have. Create scaffolded writing assignments, ask students to create products that represent group knowledge like wikis, databases, glossaries, websites. Ask students to use oral presentations.

So you knew goReact was going to come in at some point. As a goReact user and administrator for 10 years I’ll tell you, if your students are going to do oral presentations, a tool like goReact is an absolute must-have because of the amount of time it will save you. Plus, honestly, it makes students reflect more. And if your learning goal is not write a paper and writing a paper is just a way for students to show what they know, an oral presentation is a great way to do it. At Concordia, our English and CSTA departments, communication science and theater, are preparing short how-to sheets for other faculty. The English department is saying how to prompt students to write better in an AI age. And the communication science department is presenting how to ask students to orally present material even if you don’t feel you should be critiquing a speech. Here, we’re going to leverage the environment AI is created to ensure more authentic assessment takes place to begin with, and we’re going to diminish possible bad actions by students by using tools where AI is less helpful.

You can also use unique prompts. Ask students to write about a local myth or legend or a local news story. Ask students to incorporate publicly available facts about themselves or the instructor. Ask students to do comparison and contrast the things that are new. Percival Everett’s book James, which has only been out general press for a couple of weeks, not going to be in the corpus of data any of the large language models study. Media and film professors, students to critique coverage of this week’s local news story. They can still use AI tools to help them refine their writing, but it is very unlikely those AI tools will know anything about a local news. And above all, use technology tools to invite students in, not to call them out.

It’s so important because no tools exist to accurately detect AI. That was the first policy decision that my college made in February of 2023. We will not spend money or resources playing the wack-a-mole game trying to find a tool that will identify when students have used AI, because every time one of those tools comes along within a month, it is outdated and it cannot accurately and consistently identify unethical use of AI. And many faculty members can see patterns, especially if they have had their students do other writing in class. They can tell this is a very different voice, a different very organizational scheme, but even our own innate sense of what’s right and what’s wrong is not infallible. So we want to use this technology to invite students into the conversation about what is good writing, what is good research, what is appropriate, what is ethical, what is inspirational? What is the point of all of it in the first place? Rather than trying to find tech tools that will call them out.

Again, an example that uses goReact at my institution, we use a familiar tech tool, our learning management system [inaudible 00:30:04] along with goReact as an unintrusive proctoring tool. And we just have students record themselves while they’re taking assessments and nobody ever looks at the recording unless there are other red flags. And then the faculty members have a recording that they can go to, it’s doing a screen capture. It captures the student’s audio and video. And our students have reported they view this as no more intrusive than taking an assessment in a class where the professor is there. So we’re trying to use the technology in a low-key way. It does help us monitor for academic integrity, but really it’s inviting students in, asking students to be a part of the conversation about assessment.

It all comes together in this quote. This is an anonymous respondent to Concordia College’s January and February 2024. Ethically, I use survey of faculty, staff and students. I wish I had known who this respondent is because they bring up lots of stuff all in one place. And that last sentence, this is the sort of sea change we should not treat lightly is important because if we start to over rely on AI or if we fear AI, we’re not using it well. And three days ago when I gave AI the same prompt that I did in the image I showed at the beginning, it demonstrated that AI hasn’t necessarily gotten much better. I think that leaves us 10 minutes for questions and answers, and so I am open for this.

Jessica Hurdley:

Thank you so much, Joe. There is a question in the Q&A. I think what the question is getting at is more will your faculty have some sort of training or are you using your faculty orientation to allow them the chance to explore what you’ve shared with us today? As well as kind of go through what you’ve shared with us today, how are you implementing this as far as training goes for your faculty?

Joseph Kennedy:

In multiple ways. Last summer we, and by we, we have two groups on campus. One is a group focused on the practical, like how do we implement AI in our assessments today? There’s also a group that is the critical question. They are asking bigger issues such as what impact will these generative AI tools have on our sociological understanding of how we create culture? It is not a question we can answer today or even in a year, but we should be asking.

Those groups together created faculty training sessions that were offered at multiple points last summer. We’re going to be doing that again this summer and a lot more faculty have told us they plan to attend. I have one-on-one discussions with faculty as the instructional designer and with departments upon request, there will be videos that will be made available for asynchronous training. We freed up some money so that faculty members could attend the AAC&U series on artificial intelligence in the classroom and assessment. And we are going to have some time at the fall training, although it’s not going to be enough to really go in depth. I hope that answers Ibi’s question.

Jessica Hurdley:

If there are any other questions, feel free to enter those into the Q&A as well. We’ll give it a few more seconds just to see if anybody else has any questions. I thought the examples, Joe, that you provided were just wonderful on how faculty are embracing it and implementing those methods to make it harder for AI to really hack those assignments and student responses. I really liked the idea of unique prompts and leveraging that to connect with the community ideas and things that they would have to be present and aware, rather than using generative AI to produce.

Joseph Kennedy:

I said early on that we have to acknowledge that part of the problem is that we have not encouraged faculty to learn how to assess well. There are many faculty who do, and there are some faculty who are trained in assessment, but the majority of higher ed faculty in the US are not required to be trained in assessment in order to be educators. So if they assess well, it is because of their own personal growth and professional development or they just have a good talent for it. But genAI has really thrust in our face that systemically we do not support good assessment. I do see that there is a question from Chris.

Yeah, Microsoft Copilot, ChatGPT, DALL-E, Claude from Anthropic, being aware of tools like Grok, knowing how different tools that are designed to draw pictures work. Why can I not remember? Midjourney. Midjourney is a tool as well possibly, Chris. What’s going to end up happening in the next couple years is we’re going to see lots and lots of APIs that tap into large language models that develop very specific tools. And then we have to start worrying about equity because right now, any student can use ChatGPT 3.5, but the students who pay can use ChatGPT 4. And in two years, the students who pay 3.99 a month for this app that focuses on math equations solving and $7 a month for this app that focuses on historical analysis, that’ll be a whole other issue that’s coming. Ken says-

Jessica Hurdley:


Joseph Kennedy:

… am I being afraid of progress? Students do use it as a clutch. You’re absolutely right, Ken, you’re not wrong to say that. It’s like any tool that’s out there is the easiest way to use the tool is the way everyone’s going to use it first. And right now, the easiest way to use genAI is to say, “Write me a five paragraph essay on this.” But I will say that’s no different than when graphing calculators came out. That’s different than when the internet came out. That’s no different when students discovered tools like Chegg could be used to cheat rather than to help them. So I don’t think you’re being afraid of progress. I think you’re aware that every time there’s progress, it takes tremendous work by educators to teach people how to use the tools well, rather than just as a quick given.

Jessica Hurdley:

We have a couple more minutes. Joe, if you wanted to check out any of the other questions in the Q&A. One is, are you looking to train an internal AI for your institution or a faculty simply type prompt in Gemini, ChatGPT, et cetera?

Joseph Kennedy:

Well, this is where equity issues come in. We have neither the personnel capacity nor the money to create an internal AI that would actually be worth anything. We could create one. It wouldn’t be that good. We’re just not large enough. It’s a matter of scale.

Jessica Hurdley:

And then one last question. As someone who is more familiar with multiple-genAI platforms, what are your thoughts or concerns about the ethics of disability accessibility on such platforms?

Joseph Kennedy:

In terms of disability accessibility, Erin wants to take that.

Jessica Hurdley:

Oh no, you can go ahead. She’s just moving them over to answered.

Joseph Kennedy:

Oh, okay. Got it. In terms of disability accessibility, even though it’s not always appropriate to do this, I’m going to lump that together with inclusivity in general, because the ways to access an AI are by typing or by speaking to it. So any disability that prevents a person from easily typing or speaking is going to also be a problem with accessing AI. However, though many of the disabilities that relate to writing and speaking have solutions that are efficient and relatively inexpensive, not all. So some of those issues aren’t going to affect the AI tool any more than they would any other EdTech tool.

The problem is going to be that when we train generative AI and we look at a corpus of data, that corpus reflects to the AI tool, what a person is like, what a lived experience is like. So it’s not going to reflect the lived experience of people with certain disabilities, in part because some disabilities are relatively rare, and a generative model needs thousands of data to be able to make good predictions. But also in part because the lived experience of the people who programmed it and trained it is less likely to include people with certain disabilities than others. So I’m a little more worried about the hidden impact on accessibility for people with disabilities than I am about the technological and mechanistic impact, because we do have solutions for many of those already.