Prospects for Using LLMs in Content Moderation for Social Media

From the YouTube page:

Wherein I am joined by the estimable Dave Willner, who helped build the content moderation system at Meta, who talks me through how and why Facebook has squelched my open letter to former Russian Ambassador Anatoly Antonov, and why my appeal has fallen into a black hole--and how his scheme of using large language models to scale content moderation might do a better job.

The first part of the discussion concerns Ben Wittes' case at Facebook. Willner discusses his work on using LLMs in contentent moderation starting at about 40:30. You might want to start there if you already know something about the content moderation process or have been through it yourself. You can always loop back to the beginning if you're interested.

In the current regime a bit of content may be flagged in response to a user complaint but more likely will be flagged an automatic classification system. If an item of yours is flagged it will be removed and you'll be notified of that and given some general reason, such as violating community standards. As to just which community standard or standards and how your item violates it, you'll be told nothing. You'll also be given an opportunity to appeal. That appeal is likely to be reviewed by a human, but who know if and when that will actually happen.

In an LLM-enhanced content moderation regime, the LLM will be able to provide information about the standards being violated and provide fairly specific reasons why a particular bit of content has been flagged. Moreover the user could enter into a dialog with the system. If that doesn't resolve the issue, then the case could be passed on to a human for review.

This strikes me as a significant opportunity to improve social media. And it seems plausible. Two posts I did back in December of 2022 illustrate the kind of reasoning that would be involved: Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity? (Dec. 16, 2022) – which is included in a working paper: Discursive Competence in ChatGPT, Part 1: Talking with Dragons, Version 2 (January 11, 2023), ChatGPT the legal beagle: Concepts, Citizens United, Constitutional Interpretation (December 27, 2022).