Publion

Authentic Assessment in AI-Infused Learning Environments: An Evidence-Centered Design Framework and Rubric Toolkit for Academic Integrity

Arben Hoxha1Elira Leka2

1University of Tirana, Tirana, Albania

2Aleksandër Moisiu University of Durrës, Durrës, Albania

Published: Jun 04, 2026

Abstract

Generative AI tools have destabilized traditional take-home assessment by lowering the cost of producing fluent text, code, and problem solutions. Institutional responses often oscillate between prohibition and permissive use, yet both approaches fail when assessment design does not specify what counts as credible evidence of learning. This article proposes a practical framework for assessment integrity in AI-infused learning environments that shifts attention from detection to design. Using an integrative synthesis of research on authentic assessment, constructive alignment, academic integrity, and emerging guidance on generative AI, we develop an evidence-centered assessment design workflow and a rubric toolkit that make acceptable AI use transparent while preserving the core purpose of assessment: eliciting student thinking. The framework operationalizes five design decisions: defining outcome-relevant evidence, setting AI-use boundary conditions, embedding process traces and checkpoints, using rubric criteria that reward disclosure and reasoning, and adding verification moments such as oral defense or short in-class microtasks. We present a model (Figure 1) and a rubric matrix (Table 1) that can be adapted across disciplines for essays, projects, laboratory reports, and portfolios. The contribution is an implementation-ready package that reduces incentives for misuse, supports equity through clear rules and scaffolding, and enables program-level quality assurance through calibration. We conclude with implications for policy, staff development, and future research on learning outcomes in hybrid human–AI work practices.

Keywords

Authentic AssessmentGenerative AIAcademic IntegrityEvidence-Centered DesignConstructive Alignment.

Introduction

Generative AI has intensified an existing assessment challenge because many high-stakes tasks measure polished final products more than the learning processes behind them. When students can use AI systems to draft essays, generate code, summarize texts, or solve problems quickly, traditional output-only assessment becomes less reliable as evidence of student learning.

The article argues that the main problem is not simply the availability of generative AI, but the weak alignment between intended learning outcomes and the evidence collected through assessment. If assessment tasks do not require students to show reasoning, disciplinary judgment, and ownership of decisions, then academic integrity becomes difficult to maintain whether AI is banned or allowed.

Academic integrity is presented as a system-level issue rather than only an individual moral issue. The article draws on research showing that misconduct is influenced by opportunity structures, assessment conditions, and institutional culture. Generic, under-scaffolded, final-product assessments increase the risk of plagiarism, contract cheating, and inappropriate AI use.

The article also emphasizes that integrity improves when assessment tasks are contextualized, include formative feedback, and require students to make personal judgments over time. These features align with authentic assessment, which values meaningful performance tasks embedded in disciplinary practices.

In AI-infused learning environments, authentic assessment must clarify the role of AI within disciplinary work. Students need transparent guidance about what AI may support, what must remain their own work, and how their process and decisions will be evaluated.

The article critiques detection-first institutional responses. AI detection tools may have technical limitations, may produce false positives, and may create inequities, particularly for multilingual students or students using accessibility tools. A surveillance-focused approach may also weaken trust and distract from learning.

Instead of relying primarily on detection, the article proposes redesigning assessment so that the most important evidence is difficult to outsource and easy to verify. This approach is linked to constructive alignment because assessment strongly shapes student learning behavior.

The introduction establishes the article’s purpose: to develop an evidence-centered assessment design framework and a rubric toolkit for academic integrity in AI-infused environments. The proposed package supports different institutional AI-use regimes, including prohibition, constrained permission, and encouraged transparent use, while keeping the central focus on credible evidence of student learning.

Research Method

The study used an integrative synthesis and design science approach to develop practical assessment artifacts grounded in theory and empirical evidence. This method was chosen because assessment integrity in AI-infused environments spans several bodies of literature, including authentic assessment, constructive alignment, feedback, assessment for learning, academic integrity, contract cheating, generative AI in education, and responsible technology use.

The analysis followed three main steps. First, sources were thematically coded to identify design claims about reducing misconduct opportunities and strengthening the validity of assessment evidence. Second, these claims were translated into a workflow of instructor decisions using an evidence-centered assessment design model. Third, the model was operationalized into rubric dimensions and performance indicators. The toolkit was then refined through expert review by experienced instructors and academic integrity practitioners across disciplines, with emphasis on clarity, usability, and transparency of assumptions.

Results and Discussion

The article presents the Elevate Assessment Integrity Design framework and a rubric toolkit for authentic assessment in AI-infused learning environments. These outputs are designed to connect learning outcomes, acceptable AI-use conditions, process evidence, and verification mechanisms into a practical assessment design package.

The framework begins with learning outcomes and constructive alignment. Instructors are expected to identify the knowledge, skills, and dispositions that an assessment must evidence before deciding the task format or AI-use rules.

From the learning outcomes, instructors formulate an evidence claim. This claim explains what the assessor must be able to observe in order to judge mastery, such as argument quality, methodological justification, debugging reasoning, or the ability to connect claims with credible sources.

This evidence-centered approach shifts assessment away from topic descriptions alone. Rather than asking only what students should submit, the framework asks what learning evidence must be visible and interpretable in the submitted work and associated process.

The article argues that this shift makes assessment more resilient in AI-infused contexts. When tasks require observable reasoning, disciplinary judgment, and ownership of decisions, misuse becomes less attractive and less effective.

The second major design decision concerns AI-use boundary conditions. The article rejects a simple binary distinction between allowing and banning AI, instead recommending that instructors specify what AI may do and what students must do themselves.

Examples of acceptable AI support may include brainstorming, language polishing, or debugging assistance. However, students may still be required to create the final structure, justify decisions, verify claims, and demonstrate understanding of the submitted work.

The framework also emphasizes process traces and checkpoints. Drafts, notebooks, logs, checkpoint submissions, and feedback responses can help show development over time and provide evidence that the student engaged meaningfully with the task.

Verification moments are another important feature. These may include oral defenses, brief in-class microtasks, peer-teacher review, or short follow-up explanations that allow assessors to confirm whether students can explain and defend their work.

The rubric toolkit translates the framework into assessment criteria. It includes dimensions such as outcome-relevant reasoning, traceability and sourcing, process evidence and checkpoints, AI-use disclosure, verification readiness, original contribution and personalization, and ethical and responsible practice.

The article’s Table 1 shows that each rubric dimension is connected to a guiding question and example indicators. For example, outcome-relevant reasoning asks whether the work demonstrates disciplinary thinking beyond surface fluency, while AI-use disclosure asks whether tool use is documented clearly and honestly according to task rules.

The discussion stresses that language quality should be separated from reasoning quality. This distinction helps prevent students who use AI for writing polish from gaining an unfair advantage in demonstrating core learning outcomes. The framework and toolkit aim to support “integrity-by-design,” making assessment systems more resistant to outsourcing while promoting deeper student learning.

Conclusion

This article proposed a rights-respecting governance framework for learning analytics that translates ethical and legal principles into actionable controls, documentation artifacts, and an adoption roadmap. By organizing governance across the analytics lifecycle and emphasizing transparency and contestability, the framework supports institutions and vendors in implementing analytics as a support system rather than a surveillance apparatus.

The framework is intentionally pragmatic. It provides a principle-to-control mapping and a maturity model that can be used for institutional policy, procurement due diligence, and program evaluation. Future work should empirically evaluate the framework in diverse institutional contexts, including resource-constrained universities and cross-border EdTech arrangements, and should develop measurement tools for learner trust and perceived legitimacy of analytics interventions. Ultimately, learning analytics will be sustainable only if it remains legitimate in the eyes of learners and the public. Rights-respecting governance provides a path to that legitimacy by embedding accountability, intelligibility, and due process into the everyday routines of data-intensive education.

References

Almpanis, T., Conroy, D., & Joseph-Richard, P. (2025). Practical implications of generative AI on assessment: Snapshot of early reactions to assessment redesign in an HRM and a psychology course. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Ardito, C. G. (2023). Contra generative AI detection in higher education assessments. ArXiv. https://doi.org/10.48550/arxiv.2312.05241

Brufau Alvira, N., Bannister, P., & Santamaría Urbieta, A. (2025). Validating the PANDORA GenAI susceptibility rubric for higher education assessment: A field test of all translation and interpreting BA assignments. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Cheong, H.-I., Lyons, A., Houghton, R., & Majumdar, A. (2023). Secondary qualitative research methodology using online data within the context of social sciences. International Journal of Qualitative Methods. https://doi.org/10.1177/16094069231180160

Felipe, A. L., Khwakhali, U. S., & Nguyen, T. N. (2025). A framework for assessment design in the era of generative AI: Case study of take-home assignment in software-related courses. 2025 10th International STEM Education Conference (iSTEM-Ed). https://doi.org/10.1109/istem-ed65612.2025.11129352

Francis, N. J., Jones, S., & Smith, D. P. (2025). Generative AI in higher education: Balancing innovation and integrity. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Furze, L., Perkins, M., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS) in action: A pilot implementation of GenAI supported assessment. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Gonsalves, C. (2025). Contextual assessment design in the age of generative AI. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Ilieva, G., Yankova, T., Ruseva, M., & Kabaivanov, S. (2025). A framework for generative AI-driven assessment in higher education. Information, 16(6), 472. https://doi.org/10.3390/info16060472

Kickbusch, S., Ashford-Rowe, K., Kemp, A., Boreland, J., & Huijser, H. (2025). Beyond detection: Redesigning authentic assessment in an AI-mediated world. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Kiger, M. E., & Varpio, L. (2020). Thematic analysis of qualitative data: AMEE Guide No. 131. Medical Teacher, 42(8), 846-854. https://doi.org/10.1080/0142159X.2020.1755030

Kofinas, A. K., Tsay, C., & Pike, D. (2025). The impact of generative AI on academic integrity of authentic assessments within a higher education context. British Journal of Educational Technology, 56(1). https://doi.org/10.1111/bjet.13585

Lochmiller, C. R. (2021). Conducting thematic analysis with qualitative data. The Qualitative Report, 26(6), 2029-2044. https://doi.org/10.46743/2160-3715/2021.5008

Martin, A. F., Tubaltseva, S., Harrison, A., & Rubin, G. J. (2025). Participatory co-design and evaluation of a novel approach to generative AI-integrated coursework assessment in higher education. Behavioral Sciences, 15(6), 808. https://doi.org/10.3390/bs15060808

Perkins, M., Roe, J., Postma, D., McGaughran, J., & Hickerson, D. (2023). Detection of GPT-4 generated text in higher education: Combining academic judgement and software to identify generative AI tool misuse. Journal of Learning Development in Higher Education.https://doi.org/10.47408/jldhe.vi34.1307

Brufau Alvira, N., Bannister, P., & Santamaría Urbieta, A. (2025). Validating the PANDORA GenAI susceptibility rubric for higher education assessment: A field test of all translation and interpreting BA assignments. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Perkins, M., Roe, J., Postma, D., McGaughran, J., & Hickerson, D. (2023). Detection of GPT-4 generated text in higher education: Combining academic judgement and software to identify generative AI tool misuse. Journal of Learning Development in Higher Education. https://doi.org/10.47408/jldhe.vi34.1307

Ruggiano, N., & Perry, T. E. (2017). Conducting secondary analysis of qualitative data: Should we, can we, and how? Qualitative Social Work, 18(1), 81-97. https://doi.org/10.1177/1473325017700701.

Salinas-Navarro, D., Vilalta-Perdomo, E., Michel-Villarreal, R., & Montesinos, L. (2024). Using generative artificial intelligence tools to explain and enhance experiential learning for authentic assessment. International Journal of Educational Technology in Higher Education. https://doi.org/10.1186/s41239-024-00462-2

Download