Allen Institute for AI

Norms

Can machines understand human values? 🤖

About

Everday, AI plays a bigger role in making decisions that impact us; however, most AIs have little or no understanding of our values.

Norms are complex. We shouldn't expect to solve fundamental problems in ethics tomorrow; however, today we already have AI agents that miss basic, uncontroversial norms. We don't need to solve the Trolley problem to stop AIs from becoming cyber bullies, or treating people unfairly.

As a next step towards norm understanding, we propose training models to reproduce the kinds of normative judgments people make about anecdotes.

Feel free to play around with the models we've developed in this demo! You can also read the paper, check out the code, or learn some background to dig deeper into how we built them.

Demo

Below, we demo two models. One compares two actions, predicting which is less ethical according to the average user on Mechanical Turk. The other reads an anecdote and predicts which participants were in the wrong, according to members of the AITA subreddit.

Upload Data

View Results

Model predictions.

Upload Data

View Results

Model predictions.

API

This demo provides ReST API endpoints at /api/actions/predict and /api/corpus/predict.

/api/actions/predict

This endpoint compares two actions, returning the model's predictions.

Usage

POST a JSON array of objects. Each object should have the following keys:

action1
A description of the first action to compare.
action2
A description of the second action to compare.

Optionally, you can specify the query parameter: ?plot=true to also compute a base64 encoded PNG plot depicting the predicted distribution.

Examples

Using cURL, you could hit the API as follows:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>   $DOMAIN/api/actions/predict
[
  {
    "action1": 0.5804435014724731,
    "action2": 12.396906852722168
  }
]
          

Or using HTTPie:


$ echo '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>    | http post $DOMAIN/api/actions/predict
HTTP/1.0 200 OK
Content-Length: 82
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:43:35 GMT
Server: Werkzeug/0.15.4 Python/3.7.0

[
    {
        "action1": 0.5804435014724731,
        "action2": 12.396906852722168
    }
]
          

To receive the distribution plot, include ?plot=true:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>   $DOMAIN/api/actions/predict?plot=true
[
  {
    "action1": 0.5804435014724731,
    "action2": 12.396906852722168,
    "plot": "iV...mCC"
  }
]
          

/api/corpus/predict

This endpoint predicts which participants in an anecdote were in the wrong.

Usage

POST a JSON array of objects. Each object should have the following keys:

title
The title of the anecdote.
text
The text of the anecdote.

Optionally, you can specify the query parameter: ?plot=true to also compute base64 encoded PNG plots depicting the distribution of probabilities that the author is in the wrong (AUTHOR or EVERYBODY labels) and that the other is in the wrong (OTHER or EVERYBODY labels).

Examples

Using cURL, you could hit the API as follows:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   $DOMAIN/api/corpus/predict
[
  {
    "AUTHOR": 1.1538619995117188,
    "EVERYBODY": 0.06874721497297287,
    "INFO": 0.23060794174671173,
    "NOBODY": 0.5397516489028931,
    "OTHER": 0.9151269793510437
  }
]
          

Or using HTTPie:


$ echo \
>   '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   | http post $DOMAIN/api/corpus/predict
HTTP/1.0 200 OK
Content-Length: 187
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:53:27 GMT
Server: Werkzeug/0.15.4 Python/3.7.0

[
    {
        "AUTHOR": 1.1538619995117188,
        "EVERYBODY": 0.06874721497297287,
        "INFO": 0.23060794174671173,
        "NOBODY": 0.5397516489028931,
        "OTHER": 0.9151269793510437
    }
]
          

To receive the distribution plots, include ?plot=true:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   $DOMAIN/api/corpus/predict?plot=true
[
  {
    "AUTHOR": 1.1538619995117188,
    "EVERYBODY": 0.06874721497297287,
    "INFO": 0.23060794174671173,
    "NOBODY": 0.5397516489028931,
    "OTHER": 0.9151269793510437,
    "plot_author": "iV...=",
    "plot_other": "iV...="
  }
]
          

Background

Norm Understanding

Today, most AI programs have little or no understanding of human values. As AI agents become increasingly autonomous, it's increasingly important that they apply the norms of the communities in which they operate. This work seeks to develop such norm understanding by reproducing the normative judgments people make.

Modeling Subjectivity

Reasonable people can disagree about normative decisions. Norms are inherently subjective. Predicting only the most probable judgment throws away critical information. Is the norm controversial or widely agreed upon? How likely is a person to view this action negatively?

Common deep learning approaches conflate the label's subjectivity with the model's uncertainty. If the model predicts the label AUTHOR with \(0.72\) probability, we don't know whether the model claims that \(72\%\) of people believe the author was wrong, or that the model is only \(72\%\) sure that everyone agrees the author was wrong.

To address this short-coming, we augment the last layer to predict the parameters of a Dirichlet distribution. This requires training the model using a Dirichlet-multinomial likelihood rather than the more common softmax. The effect is that the model outputs a set of alphas, one for each class (\(\alpha_j\)), which are the parameters to a Dirichlet distribution. Classes with higher \(\alpha\) values are more likely, and the higher the sum of the \(\alpha\) values, the more certain the model is.

Further Reading

Feel free to check out the paper to learn more about the dataset and models we've released, or see the code for the implementation details!

Cite

If you build off this work or use this model, please cite the paper as follows:


@article{Lourie2020Scruples,
    author = {Nicholas Lourie and Ronan Le Bras and Yejin Choi},
    title = {Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes},
    journal = {arXiv e-prints},
    year = {2020},
    archivePrefix = {arXiv},
    eprint = {2008.09094},
}