A meta-search engine is a search engine that forwards user query to several other search engines and aggregates the results. In flight search domain, meta-search engine has some inherent weaknesses. They can not find all routes from low-cost airlines and they don't support the concept of mixing flights between airlines in different alliances. This thesis proposes a mashup solution for the problem. A mashup application uses data from other resources (called content provider) to create a new application with new feature and functionality that is not offered by any of the content provider. In the flight search system that we build, the data are originated from airline websites. Extracting data from the Web has several obstacles such as password-protected sites, cookies, JavaScript, Session IDs, Web forms iterations, deep Web navigation, and dynamic changes on websites.<br />We use wrapper generation technology from Lixto to solve the obstacles and do the data extraction. Some data cleaning method is also performed to the wrapper output to clean unnecessary annotations.<br />In the algorithm, flight search problem is regarded as graph search problem with airports as the nodes and pair of airports where exist direct flights between them as the edges. We introduce hub identification heuristic concept to ensure system's scalability. Instead of analyzing and evaluating all possible routes to reach the destination, this heuristic only evaluates a fraction of all possible combinations that possibly contain the best routes.<br />
de
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
flugsuche
de
dc.subject
routensuche
de
dc.subject
skalierbarkeit
de
dc.subject
wrapper generation
de
dc.subject
wrapping
de
dc.subject
web datenextraktion
de
dc.subject
lixto
de
dc.subject
hub identification
de
dc.subject
semantic web
de
dc.subject
flight search
en
dc.subject
route search
en
dc.subject
scalability
en
dc.subject
wrapper generation
en
dc.subject
wrapping
en
dc.subject
web data extraction
en
dc.subject
lixto
en
dc.subject
hub identification
en
dc.subject
semantic web
en
dc.title
An application of heuristic route search techniques for a scalable flight search system