Artificial intelligence systems are becoming an integral part of everyday life in ways that were unimaginable just a decade ago. People consult chatbots for personal dilemmas, companies rely on automated screening tools, and autonomous systems navigate roads, hospitals, and financial markets. Yet one of the biggest unanswered questions in modern technology persists. Can machines understand what is right or wrong
A new study led by first author Liwei Jiang from the University of Washington attempts to answer that question with scientific precision. The research, published in Nature Machine Intelligence, introduces an AI system called Delphi and explores whether artificial intelligence can be trained to predict human moral judgements. The article titled “Investigating machine moral judgement through the Delphi experiment” presents one of the most ambitious attempts to align machine reasoning with everyday human values.
A new frontier in moral AI
Moral decision-making has long been considered a uniquely human trait. People weigh context, emotions, intentions, and cultural expectations in ways that are deeply ingrained through experience. Machines, however, operate through patterns in data. This poses an obvious challenge. How can an AI model make sense of complex ethical dilemmas when it has no lived experience
The team behind Delphi approached this challenge by grounding the model in a classic debate from moral philosophy. Rather than teaching the machine a rigid set of ethical rules, they adopted a bottom-up strategy in which the model learns directly from a large volume of human judgments. This reflects the method envisioned by philosopher John Rawls, who proposed that ethical patterns could be discovered by analysing moral responses from many people.
The research group built a massive dataset called Norm Bank, containing over 1.7 million human judgments describing how people react to a broad spectrum of everyday situations. This dataset became the moral textbook for Delphi. With it, the team aimed to explore whether an AI model could detect general patterns in human morality and apply them to new scenarios.
How Delphi learns to judge
Delphi is built on top of UNICORN, a large-scale common sense reasoning model derived from the T5 11B architecture. The model processes natural language descriptions of events or statements and produces short moral assessments. These can be straightforward yes-or-no answers or more nuanced responses, such as ‘it is irresponsible’ or ‘it is understandable’.
Norm Bank supplies the scenarios used to train the model. These situations range from simple actions, such as killing a mosquito, to complex dilemmas such as lying to protect someone’s feelings. The data are drawn from major existing datasets within natural language processing research, including Social Chemistry, Ethics, Moral Stories, and Social Bias Frames. Each example includes the situation itself and a corresponding crowd-sourced judgement.
Through this bottom-up approach, the researchers intend to bypass the rigidity of top-down moral rule sets. Instead of programming principles like do not lie or always help others, Delphi observes how thousands of people respond to realistic moral problems and internalises those patterns. The model learns compositional reasoning and can adjust its answer accordingly when the context changes. This is one of the most striking results of the study.
Context matters more than anything else
One of the challenges in machine morality involves what philosophers refer to as defeasible reasoning. A judgement may change depending on circumstances. Delphi shows a strong ability to detect these shifts.
The study gives several examples. Killing a bear is judged as wrong. Killing a bear to save a child is judged as acceptable. Throwing a ball is fine. Throwing a metal ball is dangerous. Throwing a meatball is rude. These small variations in scenario wording often confuse many language models, including previous generations such as GPT-3 and GPT-4. Delphi performs more consistently because it is trained specifically on moral distinctions.
In a controlled benchmark of 259 compositional moral situations, Delphi demonstrated superior performance compared to GPT-3. It matched the direction of human judgement changes one hundred percent of the time. This shows that the model does not simply memorise patterns in the data but generalises moral reasoning across contexts.
How Delphi outperforms other language models
The study compares Delphi with several state-of-the-art models. When tested on held-out examples from Norm Bank, Delphi reached an accuracy of 92.8 percent. In contrast, GPT 3 scored 60.2 percent in zero-shot mode and improved to 82.8 percent only when given examples within the prompt. GPT 4 and GPT 3.5 achieved an accuracy of around 79 percent.
These results indicate that large scale language models, even with alignment techniques, do not automatically learn deep patterns of moral judgement. They can repeat training data, but are not inherently designed for ethical reasoning. Delphi demonstrates that a specialised training process oriented around moral text produces significantly more consistent performance.
The researchers stress that model size also matters. The T5 11B backbone achieved higher accuracy than smaller versions, and performance improved steadily as the amount of training data increased. This reinforces one of the broader lessons of modern AI. Scale alone is not enough, but when combined with task-specific data, it can dramatically enhance what a model is capable of doing.
Benefits beyond moral classification
Although Delphi was not created to serve as a moral authority, it demonstrates promising applications. The researchers tested whether the model could improve hate speech detection, a task that frequently requires moral interpretation. They fine-tuned Delphi on one hundred examples from two challenging datasets and found that it outperformed strong baselines in both in-distribution and out-of-distribution tests.
Another experiment used Delphi to guide story generation. When incorporated as a reranking tool for selecting moral sentences in narratives, the model increased the prosocial quality of generated stories without reducing fluency. Human evaluators rated the Delphi guided stories as more ethical and more positive across dimensions such as care, fairness and loyalty.
Delphi also transferred well to tasks in the ethics benchmark, including utilitarianism, deontology and virtue ethics. This shows that a model trained on everyday moral situations can adapt to more formal ethical frameworks with minimal additional supervision.
-Liwei Jiang
The limits and risks embedded in the system
Despite its strengths, Delphi reveals major limitations. The research team highlights the presence of social biases within the model. Because the Norm Bank dataset largely reflects the views of educated white workers in the United States, the model inevitably inherits demographic imbalances.
To measure these biases systematically, the researchers created eight thousand scenarios based on the Universal Declaration of Human Rights. Delphi incorrectly denied certain rights in 1.3 percent of cases. Errors were more common for identities associated with lower socioeconomic status, current geopolitical conflict zones and specific nationalities. While these rates might seem small, they reflect deeper structural issues in the training data.
Another weakness is limited cultural awareness. The model can interpret greetings in France accurately but fails with customs in countries such as India or Sri Lanka. This demonstrates that moral values encoded in Western social norms do not readily transfer to all cultural contexts.
Towards hybrid moral reasoning
To strengthen the system, the study explored hybrid methods that blend bottom-up data with top-down ethical principles. In one approach, they created Delphi Plus by adding justice-directed training examples drawn from user interactions. This method reduced the bias rate significantly when tested again on human rights prompts.
Another approach introduced DelphiHybrid, a model that combines neural predictions with symbolic reasoning. Instead of relying solely on learned representations, DelphiHybrid constructs a moral constraint graph in which nodes represent moral assessments and edges represent logical relations such as entailment or contradiction. A constrained optimisation process then identifies the most consistent judgement.
This technique draws from Bernard Gert’s common morality framework and incorporates explicit ethical rules such as do not harm or act transparently. DelphiHybrid performed better than Delphi on adversarial moral situations and improved interpretability by showing how intermediate moral considerations influence the final outcome.
Reference
Jiang, L., Hwang, J. D., Bhagavatula, C., Le Bras, R., Liang, J. T., Levine, S., Dodge, J., Sakaguchi, K., Forbes, M., Hessel, J., Borchardt, J., Sorensen, T., Gabriel, S., Tsvetkov, Y., Etzioni, O., Sap, M., Rini, R., & Choi, Y. (2025). Investigating machine moral judgement through the Delphi experiment. Nature Machine Intelligence, 7, 145 to 160. https://doi.org/10.1038/s42256-024-00969-6