Effective matching of patients to clinical trials using entity extraction and neural re-ranking

Kusa, Wojciech; Mendoza, Óscar E; Knoth, Petr; Pasi, Gabriella; Hanbury, Allan

doi:10.1016/j.jbi.2023.104444

DC Field

Value

Language

dc.contributor.author

Kusa, Wojciech

dc.contributor.author

Mendoza, Óscar E

dc.contributor.author

Knoth, Petr

dc.contributor.author

Pasi, Gabriella

dc.contributor.author

Hanbury, Allan

dc.date.accessioned

2023-09-12T09:31:17Z

dc.date.available

2023-09-12T09:31:17Z

dc.date.issued

2023-08

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kusa, W., Mendoza, Ó. E., Knoth, P., Pasi, G., & Hanbury, A. (2023). Effective matching of patients to clinical trials using entity extraction and neural re-ranking. <i>Journal of Biomedical Informatics</i>, <i>144</i>, Article 104444. https://doi.org/10.1016/j.jbi.2023.104444</div> </div>

dc.identifier.issn

1532-0464

dc.identifier.uri

http://hdl.handle.net/20.500.12708/188224

dc.description.abstract

Introduction: Clinical trials (CTs) often fail due to inadequate patient recruitment. Finding eligible patients involves comparing the patient’s information with the CT eligibility criteria. Automated patient matching offers the promise of improving the process, yet the main difficulties of CT retrieval lie in the semantic complexity of matching unstructured patient descriptions with semi-structured, multi-field CT documents and in capturing the meaning of negation coming from the eligibility criteria. Objectives: This paper tackles the challenges of CT retrieval by presenting an approach that addresses the patient-to-trials paradigm. Our approach involves two key components in a pipeline-based model: (i) a data enrichment technique for enhancing both queries and documents during the first retrieval stage, and (ii) a novel re-ranking schema that uses a Transformer network in a setup adapted to this task by leveraging the structure of the CT documents. Methods: We use named entity recognition and negation detection in both patient description and the eligibility section of CTs. We further classify patient descriptions and CT eligibility criteria into current, past, and family medical conditions. This extracted information is used to boost the importance of disease and drug mentions in both query and index for lexical retrieval. Furthermore, we propose a two-step training schema for the Transformer network used to re-rank the results from the lexical retrieval. The first step focuses on matching patient information with the descriptive sections of trials, while the second step aims to determine eligibility by matching patient information with the criteria section. Results Our findings indicate that the inclusion criteria section of the CT has a great influence on the relevance score in lexical models, and that the enrichment techniques for queries and documents improve the retrieval of relevant trials. The re-ranking strategy, based on our training schema, consistently enhances CT retrieval and shows improved performance by 15% in terms of precision at retrieving eligible trials. Conclusion The results of our experiments suggest the benefit of making use of extracted entities. Moreover, our proposed re-ranking schema shows promising effectiveness compared to larger neural models, even with limited training data. These findings offer valuable insights for improving methods for retrieval of clinical documents.

dc.description.sponsorship

European Commission

dc.language.iso

dc.publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

dc.relation.ispartof

Journal of Biomedical Informatics

dc.subject

Humans

dc.subject

Clinical natural language processing

dc.subject

Clinical trials matching

dc.subject

Eligibility criteria

dc.subject

Information retrieval

dc.subject

Neural re-ranking

dc.subject

Query reformulation

dc.subject

TREC clinical trials

dc.subject

Information Storage and Retrieval

dc.subject

Semantics

dc.title

Effective matching of patients to clinical trials using entity extraction and neural re-ranking

dc.type

Article

dc.type

Artikel

dc.identifier.pmid

37451494

dc.identifier.scopus

2-s2.0-85166487784

dc.identifier.url

https://api.elsevier.com/content/abstract/scopus_id/85166487784

dc.contributor.affiliation

University of Milano-Bicocca, Italy

dc.contributor.affiliation

The Open University, United Kingdom of Great Britain and Northern Ireland (the)

dc.contributor.affiliation

University of Milano-Bicocca, Italy

dc.relation.grantno

860721

dc.type.category

Original Research Article

tuw.container.volume

144

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

wb.publication.intCoWork

International Co-publication

tuw.project.title

Domänen-spezifische Systeme für Informationsextraktion und -suche

tuw.researchTopic.id

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

100

dcterms.isPartOf.title

Journal of Biomedical Informatics

tuw.publication.orgunit

E194-04 - Forschungsbereich Data Science

tuw.publisher.doi

10.1016/j.jbi.2023.104444

dc.identifier.articleid

104444

dc.identifier.eissn

1532-0480

dc.description.numberOfPages

tuw.author.orcid

0000-0003-4420-4147

tuw.author.orcid

0000-0003-2725-2972

tuw.author.orcid

0000-0003-1161-7359

tuw.author.orcid

0000-0002-6080-8170

tuw.author.orcid

0000-0002-7149-5843

wb.sci

true

wb.sciencebranch

Informatik

wb.sciencebranch

Wirtschaftswissenschaften

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

5020

wb.sciencebranch.value

item.openairetype

research article

item.cerifentitytype

Publications

item.grantfulltext

none

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.fulltext

no Fulltext

crisitem.project.funder

European Commission

crisitem.project.grantno

860721

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.dept

University of Milano-Bicocca

crisitem.author.dept

The Open University

crisitem.author.dept

University of Milano-Bicocca

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.orcid

0000-0003-4420-4147

crisitem.author.orcid

0000-0003-2725-2972

crisitem.author.orcid

0000-0003-1161-7359

crisitem.author.orcid

0000-0002-6080-8170

crisitem.author.orcid

0000-0002-7149-5843

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

Appears in Collections:

Article

Show simple item record

Page view(s)

308

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM