TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model

Mucha, Wiktor; Cuconasu, Florin; Etori, Naome A.; Kalokyri, Valia; Trappolini, Giovanni

doi:10.1007/978-3-031-62849-8_35

DC Field

Value

Language

dc.contributor.author

Mucha, Wiktor

dc.contributor.author

Cuconasu, Florin

dc.contributor.author

Etori, Naome A.

dc.contributor.author

Kalokyri, Valia

dc.contributor.author

Trappolini, Giovanni

dc.date.accessioned

2024-11-16T11:27:54Z

dc.date.available

2024-11-16T11:27:54Z

dc.date.issued

2024

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Mucha, W., Cuconasu, F., Etori, N. A., Kalokyri, V., & Trappolini, G. (2024). TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model. In <i>Computers Helping People with Special Needs</i> (pp. 285–291). https://doi.org/10.1007/978-3-031-62849-8_35</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/204357

dc.description.abstract

The ability to read, understand and find important information from written text is a critical skill in our daily lives for our independence, comfort and safety. However, a significant part of our society is affected by partial vision impairment, which leads to discomfort and dependency in daily activities. To address the limitations of this part of society, we propose an intelligent reading assistant based on smart glasses with embedded RGB cameras and a Large Language Model (LLM), whose functionality goes beyond corrective lenses. The video recorded from the egocentric perspective of a person wearing the glasses is processed to localise text information using object detection and optical character recognition methods. The LLM processes the data and allows the user to interact with the text and responds to a given query, thus extending the functionality of corrective lenses with the ability to find and summarize knowledge from the text. To evaluate our method, we create a chat-based application that allows the user to interact with the system. The evaluation is conducted in a real-world setting, such as reading menus in a restaurant, and involves four participants. The results show robust accuracy in text retrieval. The system not only provides accurate meal suggestions but also achieves high user satisfaction, highlighting the potential of smart glasses and LLMs in assisting people with special needs.

dc.description.sponsorship

European Commission

dc.language.iso

dc.relation.ispartofseries

Lecture Notes in Computer Science

dc.subject

AAL

dc.subject

Assistive Technology (AT)

dc.subject

egocentric vision

dc.subject

LLM

dc.subject

reading assistance

dc.title

TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.isbn

978-3-031-62849-8

dc.description.startpage

285

dc.description.endpage

291

dc.relation.grantno

861091

dc.type.category

Full-Paper Contribution

tuw.booktitle

Computers Helping People with Special Needs

tuw.container.volume

14751

tuw.peerreviewed

true

tuw.project.title

Privacy-Aware and Acceptable Video-Based Technologies and Services for Active and Assisted Living

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publisher.doi

10.1007/978-3-031-62849-8_35

dc.description.numberOfPages

tuw.author.orcid

0000-0002-6048-3425

tuw.author.orcid

0009-0008-9768-1047

tuw.author.orcid

0000-0001-7772-1103

tuw.author.orcid

0000-0002-5245-8238

tuw.author.orcid

0000-0002-5515-634X

tuw.event.name

19th International Conference on Computers Helping People with Special Need (ICCHP 2024)

tuw.event.startdate

08-07-2024

tuw.event.enddate

12-08-2024

tuw.event.online

Hybrid

tuw.event.type

Event for scientific audience

tuw.event.place

Linz

tuw.event.country

tuw.event.presenter

Mucha, Wiktor

tuw.presentation.online

Online

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

restricted

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.orcid

0000-0002-6048-3425

crisitem.author.orcid

0009-0008-9768-1047

crisitem.author.orcid

0000-0001-7772-1103

crisitem.author.orcid

0000-0002-5245-8238

crisitem.author.orcid

0000-0002-5515-634X

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.project.funder

European Commission

crisitem.project.grantno

861091

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

checked on Nov 16, 2024

Download(s)

checked on Nov 16, 2024

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM