Using natural language processing to automate the Bechdel test

Westphal, Krista

doi:10.34726/hss.2018.26183

DC Field

Value

Language

dc.contributor.advisor

Hanbury, Allan

dc.contributor.author

Westphal, Krista

dc.date.accessioned

2020-06-28T19:21:35Z

dc.date.issued

2018

dc.date.submitted

2018-02

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Westphal, K. (2018). <i>Using natural language processing to automate the Bechdel test</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.26183</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2018.26183

dc.identifier.uri

http://hdl.handle.net/20.500.12708/4344

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

The Bechdel test asks three questions: does a movie contain two named female characters, do two female characters converse at some point during the movie and is there at least one conversation between female characters that is not about a man? If all questions can be answered positively, then the film passes the Bechdel test. This thesis defines and implements methods for automating the Bechdel test for screenplays and novels. Being able to automate this task would allow for large-scale analyses, permitting researchers to analyse trends over long time periods, for example, that would otherwise only be possible with time consuming manual methods. Previous research exists for automating the Bechdel test for screenplays, which provided the basis for the approach described in this thesis. Although the Bechdel test was originally formulated for movies, the questions are just as applicable to novels. However, as far as we could find, no previous research exists for automating the Bechdel test for novels. For screenplays we first parsed the text using a new rule-based approach that relies on the specialized text formatting required for screenplays. Then we identified all the characters who appeared in speaking roles and assigned each a gender by using a newly developed algorithm that incorporates census data about names and the Internet Movie Database (IMDb) information about the specific film. We also used a machine learning approach to predict if there is at least one conversation about something other than a man between the identified female characters. The results achieved for screenplays are comparable to the previous published work. Novels required a different approach than screenplays, due to the differences in structure between the two texts. For novels we used a Named-Entity Recognizer and a rule-based algorithm that connects the different names used for each character throughout the text, to identify all the characters in a novel. Using quote attribution, we then determined which character says which lines of dialogue, and so establish who converses with whom. The method developed for novels achieved perfect accuracy on a small dataset of five novels.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Bechdel Test

dc.subject

Naturliche Sprachverarbeitung

dc.subject

Maschinelles Lernen

dc.subject

Bechdel Test

dc.subject

Natural Language Processing

dc.subject

Machine Learning

dc.title

Using natural language processing to automate the Bechdel test

dc.title.alternative

Automatisierung des Bechdel Tests durch Verarbeitung natürlicher Sprache

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2018.26183

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Krista Westphal

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC14552095

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-108929

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-7149-5843

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(727.71 kB)

In Copyright

Show simple item record

Page view(s)

690

checked on Nov 19, 2023

Download(s)

545

checked on Nov 19, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM