I am a Research Associate at the Leverhulme Centre for the Future of Intelligence, University of Cambridge, where I focus on AI evaluation. My work includes assessing the validity of benchmarks, evaluating the cognitive abilities of large language models, and translating AI capabilities to job demands in the human workforce. Some of my research is supported by the OECD.

Previously, I was a Royal Academy of Engineering UK IC postdoctoral research fellow investigating the impact of explanations of AI predictions on our beliefs. I also studied people’s causal and probabilistic reasoning and have a strong interest in data analysis, causal modeling and Bayesian network analysis.

I received a Ph.D. in Psychology from Birkbeck’s Psychological Sciences department, an M.A. in Logic and Philosophy of Science from the Munich Center for Mathematical Philosophy, LMU and a B.A. in Philosophy from University of Belgrade, Serbia. See my CV for more info on my background, research and work experience.

I play the violin in Paprika: The Balkan and East European band.

Publications

Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers

We explore whether benchmarks can be solved using simple n-gram patterns and whether LLMs exploit these patterns to solve benchmark tasks.

A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment

A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment

Evaluation of the physical common-sense reasoning abilities of LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini 1.5 Pro) by embedding them in a 3D environment (Animal-AI Testbed) and comparing their performance to other agents and human children.

Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence

Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence

An analysis of the design and outcomes of the Melting Pot competition, which measures agents’ ability to cooperate with others. We developed cognitive profiles for the agents submitted to the competition.

Testing the maximum entropy approach to awareness growth in Bayesian epistemology and decision theory

Testing the maximum entropy approach to awareness growth in Bayesian epistemology and decision theory

Applying the Maximum Entropy approach to awareness growth in the Bayesian framework, i.e. incorporating new events that we previously did not consider possible.

The impact of explanations as communicative acts on belief in a claim: The role of source reliability

The impact of explanations as communicative acts on belief in a claim: The role of source reliability

Investigating the effects of (good) explanations and the explainer’s reliability on our beliefs in what is being explained.

Argument and explanation

Argument and explanation

We bring together two closely related, but distinct, notions: argument and explanation. We provide a review of relevant research on these notions, drawn both from the cognitive science and the artificial intelligence (AI) literatures. We identify key directions for future research, indicating areas where bringing together cognitive science and AI perspectives would be mutually beneficial.

Can counterfactual explanations of AI systems’ predictions skew lay users’ causal intuitions about the world? If so, can we correct for that?

Can counterfactual explanations of AI systems’ predictions skew lay users’ causal intuitions about the world? If so, can we correct for that?

We explore some of the undesirable effects of providing explanations of AI systems to human users and ways to mitigate such effects. We show how providing counterfactual explanations of AI systems’ predictions unjustifiably changes people’s beliefs about causal relationships in the real world. We also show how health warning style messaging can prevent such a change in beliefs.

On the transferability of insights from the psychology of explanation to explainable AI

On the transferability of insights from the psychology of explanation to explainable AI

A discussion of the consequences of directly applying the insights from the psychology of explanation (that mostly focuses on causal explanations) to explainable AI (where most AI systems are based on associations).

Explanation in AI systems

Explanation in AI systems

What do we do with our existing models when we encounter new variables to consider? Does the order in which we learn variables matter? The paper investigates two modeling strategies and experimentally tests how people reason when presented with new variables and in different orders.

The propensity interpretation of probability and diagnostic split in explaining away

The propensity interpretation of probability and diagnostic split in explaining away

Empirical testing of the effects of the propensity interpretation of probability and ‘diagnostic split’ reasoning in the context of explaining away.

Widening Access to Bayesian Problem Solving

Widening Access to Bayesian Problem Solving

An experimental exploration of whether a Bayesian network modeling tool helps lay people to find correct solutions to complex problems.

Sequential diagnostic reasoning with independent causes

Sequential diagnostic reasoning with independent causes

What do we do with our existing models when we encounter new variables to consider? Does the order in which we learn variables matter? The paper investigates two modeling strategies and experimentally tests how people reason when presented with new variables and in different orders.

Explaining away: Significance of priors, diagnostic reasoning, and structural complexity

Explaining away: Significance of priors, diagnostic reasoning, and structural complexity

Investigating people’s reasoning in explaining away situations by manipulating the priors of causes and the structural complexity of the causal Baeysian networks.

Confirmation by Explanation: A Bayesian Justification of IBE

Confirmation by Explanation: A Bayesian Justification of IBE

A justification for Inference to the Best Explanation (IBE) is provided by identifying conditions under which the best explanation of evidence can offer a confirmatory boost to the hypotheses under consideration.

Confirmation and the Generalized Nagel-Schaffner Model of Reduction: A Bayesian Analysis

Confirmation and the Generalized Nagel-Schaffner Model of Reduction: A Bayesian Analysis

Analyzing confirmation between theories in cases of intertheoretic reduction (e.g. reducing thermodynamics to statistical mechanics) using Bayesian networks.

Past Projects

Mistral AI Hackathon - CompanionAI
I developed a conversational companion for elderly individuals and those with memory challenges. The companion, implemented as a Telegram bot and using Mistral AI LLMs as backend, maintains conversational history and is designed to be empathetic toward the user.
Mistral AI Hackathon - CompanionAI
Workshop on Human Behavioral Aspects of (X)AI
I organised a workshop that brought together researchers from machine learning and cognitive science to discuss the behavioral aspects of explainable AI.
Workshop on Human Behavioral Aspects of (X)AI
(Un)interesting correlations: What are the chances that correlations lead to causation?
We use directed acyclic graphs (DAGs) to investigate the chances that two variables are causally connected, correlated, and that a covariate is inducing a correlation when controlled for.
(Un)interesting correlations: What are the chances that correlations lead to causation?
Turing Data Study Group: Optimising the supply chain to minimise waste and delivery mileage
I worked on predicting deliveries to stores such that waste is minimised.
Turing Data Study Group: Optimising the supply chain to minimise waste and delivery mileage
(Causal) Bayesian modeling of investment factors and Environmental, Social and Governance (ESG) criteria
As part of the BlackRock’s Factor Based Strategies Group I worked on understanding how some ESG criteria such as carbon emissions can impact return on equity.
(Causal) Bayesian modeling of investment factors and Environmental, Social and Governance (ESG) criteria

Contact

My email address is marko dot tesic375 little monkey gmail dot com.