Would a Language Model Push You Off A Bridge? Pt. 2
Utilitarianism, to recap, is a consequentialist decision-making framework which states that the best actions produce the most 'pleasure' for the greatest number of people. Deontology is a non-consequentialist framework which states there are certain 'rules' and 'duties' that must be adhered to on principle, regardless of consequences. In the last post, we gave four popular language models (Claude Sonnet 4.5, GPT-5 Nano, Llama-3.3-70B-Instruct, and Gemini 2.5 Flash) six hypothetical scenarios to see whether they'd defer to a utilitarian or deontological framework. Models tended toward deontological options in most cases, but consistently chose to preserve life when situations were explicitly life-or-death. This post digs deeper into what happens when models are explicitly told which framework to use.