Can machines understand human values? 🤖
Everday, AI plays a bigger role in making decisions that impact us; however, most AIs have little or no understanding of our values.
Norms are complex. We shouldn't expect to solve fundamental problems in ethics tomorrow; however, today we already have AI agents that miss basic, uncontroversial norms. We don't need to solve the Trolley problem to stop AIs from becoming cyber bullies, or treating people unfairly.
As a next step towards norm understanding, we propose training models to reproduce the kinds of normative judgments people make about anecdotes.
Feel free to play around with the models we've developed in this demo! You can also read the paper, check out the code, or learn some background to dig deeper into how we built them.
Below, we demo two models. One compares two actions, predicting which is less ethical according to the average user on Mechanical Turk. The other reads an anecdote and predicts which participants were in the wrong, according to members of the AITA subreddit.
This demo provides ReST API endpoints
at /api/actions/predict
and /api/corpus/predict
.
/api/actions/predict
This endpoint compares two actions, returning the model's predictions.
POST
a JSON array of objects. Each object should have the following keys:
Optionally, you can specify the query
parameter: ?plot=true
to also compute a base64 encoded
PNG plot depicting the predicted distribution.
Using cURL, you could hit the API as follows:
$ curl \
> --request POST \
> --header "Content-Type: application/json" \
> --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
> $DOMAIN/api/actions/predict
[
{
"action1": 0.5804435014724731,
"action2": 12.396906852722168
}
]
Or using HTTPie:
$ echo '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
> | http post $DOMAIN/api/actions/predict
HTTP/1.0 200 OK
Content-Length: 82
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:43:35 GMT
Server: Werkzeug/0.15.4 Python/3.7.0
[
{
"action1": 0.5804435014724731,
"action2": 12.396906852722168
}
]
To receive the distribution plot, include ?plot=true
:
$ curl \
> --request POST \
> --header "Content-Type: application/json" \
> --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
> $DOMAIN/api/actions/predict?plot=true
[
{
"action1": 0.5804435014724731,
"action2": 12.396906852722168,
"plot": "iV...mCC"
}
]
/api/corpus/predict
This endpoint predicts which participants in an anecdote were in the wrong.
POST
a JSON array of objects. Each object should have
the following keys:
Optionally, you can specify the query
parameter: ?plot=true
to also compute base64 encoded
PNG plots depicting the distribution of probabilities that the
author is in the wrong (AUTHOR
or EVERYBODY
labels) and that the other is in the wrong
(OTHER
or EVERYBODY
labels).
Using cURL, you could hit the API as follows:
$ curl \
> --request POST \
> --header "Content-Type: application/json" \
> --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
> $DOMAIN/api/corpus/predict
[
{
"AUTHOR": 1.1538619995117188,
"EVERYBODY": 0.06874721497297287,
"INFO": 0.23060794174671173,
"NOBODY": 0.5397516489028931,
"OTHER": 0.9151269793510437
}
]
Or using HTTPie:
$ echo \
> '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
> | http post $DOMAIN/api/corpus/predict
HTTP/1.0 200 OK
Content-Length: 187
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:53:27 GMT
Server: Werkzeug/0.15.4 Python/3.7.0
[
{
"AUTHOR": 1.1538619995117188,
"EVERYBODY": 0.06874721497297287,
"INFO": 0.23060794174671173,
"NOBODY": 0.5397516489028931,
"OTHER": 0.9151269793510437
}
]
To receive the distribution plots, include ?plot=true
:
$ curl \
> --request POST \
> --header "Content-Type: application/json" \
> --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
> $DOMAIN/api/corpus/predict?plot=true
[
{
"AUTHOR": 1.1538619995117188,
"EVERYBODY": 0.06874721497297287,
"INFO": 0.23060794174671173,
"NOBODY": 0.5397516489028931,
"OTHER": 0.9151269793510437,
"plot_author": "iV...=",
"plot_other": "iV...="
}
]
Today, most AI programs have little or no understanding of human values. As AI agents become increasingly autonomous, it's increasingly important that they apply the norms of the communities in which they operate. This work seeks to develop such norm understanding by reproducing the normative judgments people make.
Reasonable people can disagree about normative decisions. Norms are inherently subjective. Predicting only the most probable judgment throws away critical information. Is the norm controversial or widely agreed upon? How likely is a person to view this action negatively?
Common deep learning approaches conflate the label's subjectivity with the model's uncertainty. If the model predicts the label AUTHOR with \(0.72\) probability, we don't know whether the model claims that \(72\%\) of people believe the author was wrong, or that the model is only \(72\%\) sure that everyone agrees the author was wrong.
To address this short-coming, we augment the last layer to predict the parameters of a Dirichlet distribution. This requires training the model using a Dirichlet-multinomial likelihood rather than the more common softmax. The effect is that the model outputs a set of alphas, one for each class (\(\alpha_j\)), which are the parameters to a Dirichlet distribution. Classes with higher \(\alpha\) values are more likely, and the higher the sum of the \(\alpha\) values, the more certain the model is.
If you build off this work or use this model, please cite the paper as follows:
@article{Lourie2020Scruples,
author = {Nicholas Lourie and Ronan Le Bras and Yejin Choi},
title = {Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes},
journal = {arXiv e-prints},
year = {2020},
archivePrefix = {arXiv},
eprint = {2008.09094},
}