Norms: Can Machines Understand Human Values?

Everday, AI plays a bigger role in making decisions that impact us; however, most AIs have little or no understanding of our values.

Norms are complex. We shouldn't expect to solve fundamental problems in ethics tomorrow; however, today we already have AI agents that miss basic, uncontroversial norms. We don't need to solve the Trolley problem to stop AIs from becoming cyber bullies, or treating people unfairly.

As a next step towards norm understanding, we propose training models to reproduce the kinds of normative judgments people make about anecdotes.

Feel free to play around with the models we've developed in this demo! You can also read the paper, check out the code, or learn some background to dig deeper into how we built them.

Demo

Below, we demo two models. One compares two actions, predicting which is less ethical according to the average user on Mechanical Turk. The other reads an anecdote and predicts which participants were in the wrong, according to members of the AITA subreddit.

(Optional) Preset Examples

View Results

Model predictions.

(Optional) Preset Examples

View Results

Model predictions.

This demo provides ReST API endpoints at /api/actions/predict and /api/corpus/predict.

`/api/actions/predict`

This endpoint compares two actions, returning the model's predictions.

Usage

POST a JSON array of objects. Each object should have the following keys:

action1: A description of the first action to compare.
action2: A description of the second action to compare.

Optionally, you can specify the query parameter: ?plot=true to also compute a base64 encoded PNG plot depicting the predicted distribution.

Examples

Using cURL, you could hit the API as follows:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>   $DOMAIN/api/actions/predict
[
  {
    "action1": 0.5804435014724731,
    "action2": 12.396906852722168
  }
]

Or using HTTPie:


$ echo '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>    | http post $DOMAIN/api/actions/predict
HTTP/1.0 200 OK
Content-Length: 82
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:43:35 GMT
Server: Werkzeug/0.15.4 Python/3.7.0

[
    {
        "action1": 0.5804435014724731,
        "action2": 12.396906852722168
    }
]

To receive the distribution plot, include ?plot=true:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"action1": "Volunteering at my school.", "action2": "Fighting with my sibling."}]' \
>   $DOMAIN/api/actions/predict?plot=true
[
  {
    "action1": 0.5804435014724731,
    "action2": 12.396906852722168,
    "plot": "iV...mCC"
  }
]

`/api/corpus/predict`

This endpoint predicts which participants in an anecdote were in the wrong.

Usage

POST a JSON array of objects. Each object should have the following keys:

title: The title of the anecdote.
text: The text of the anecdote.

Optionally, you can specify the query parameter: ?plot=true to also compute base64 encoded PNG plots depicting the distribution of probabilities that the author is in the wrong (AUTHOR or EVERYBODY labels) and that the other is in the wrong (OTHER or EVERYBODY labels).

Examples

Using cURL, you could hit the API as follows:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   $DOMAIN/api/corpus/predict
[
  {
    "AUTHOR": 1.1538619995117188,
    "EVERYBODY": 0.06874721497297287,
    "INFO": 0.23060794174671173,
    "NOBODY": 0.5397516489028931,
    "OTHER": 0.9151269793510437
  }
]

Or using HTTPie:


$ echo \
>   '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   | http post $DOMAIN/api/corpus/predict
HTTP/1.0 200 OK
Content-Length: 187
Content-Type: application/json
Date: Thu, 12 Dec 2019 22:53:27 GMT
Server: Werkzeug/0.15.4 Python/3.7.0

[
    {
        "AUTHOR": 1.1538619995117188,
        "EVERYBODY": 0.06874721497297287,
        "INFO": 0.23060794174671173,
        "NOBODY": 0.5397516489028931,
        "OTHER": 0.9151269793510437
    }
]

To receive the distribution plots, include ?plot=true:


$ curl \
>   --request POST \
>   --header "Content-Type: application/json" \
>   --data '[{"title": "Never texting back", "text": "I never text my friends back. I always forget."}]' \
>   $DOMAIN/api/corpus/predict?plot=true
[
  {
    "AUTHOR": 1.1538619995117188,
    "EVERYBODY": 0.06874721497297287,
    "INFO": 0.23060794174671173,
    "NOBODY": 0.5397516489028931,
    "OTHER": 0.9151269793510437,
    "plot_author": "iV...=",
    "plot_other": "iV...="
  }
]

Norm Understanding

Today, most AI programs have little or no understanding of human values. As AI agents become increasingly autonomous, it's increasingly important that they apply the norms of the communities in which they operate. This work seeks to develop such norm understanding by reproducing the normative judgments people make.

Modeling Subjectivity

Reasonable people can disagree about normative decisions. Norms are inherently subjective. Predicting only the most probable judgment throws away critical information. Is the norm controversial or widely agreed upon? How likely is a person to view this action negatively?

Common deep learning approaches conflate the label's subjectivity with the model's uncertainty. If the model predicts the label AUTHOR with \(0.72\) probability, we don't know whether the model claims that \(72\%\) of people believe the author was wrong, or that the model is only \(72\%\) sure that everyone agrees the author was wrong.

To address this short-coming, we augment the last layer to predict the parameters of a Dirichlet distribution. This requires training the model using a Dirichlet-multinomial likelihood rather than the more common softmax. The effect is that the model outputs a set of alphas, one for each class (\(\alpha_j\)), which are the parameters to a Dirichlet distribution. Classes with higher \(\alpha\) values are more likely, and the higher the sum of the \(\alpha\) values, the more certain the model is.

Norms ⚖

About

Demo

Upload Data

View Results

Upload Data

View Results

API

`/api/actions/predict`

Usage

Examples

`/api/corpus/predict`

Usage

Examples

Background

Norm Understanding

Modeling Subjectivity

Further Reading

Cite