Darmaputra, Y. (2008). An application of heuristic route search techniques for a scalable flight search system [Master Thesis, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-27335
flugsuche; routensuche; skalierbarkeit; wrapper generation; wrapping; web datenextraktion; lixto; hub identification; semantic web
de
flight search; route search; scalability; wrapper generation; wrapping; web data extraction; lixto; hub identification; semantic web
en
Abstract:
A meta-search engine is a search engine that forwards user query to several other search engines and aggregates the results. In flight search domain, meta-search engine has some inherent weaknesses. They can not find all routes from low-cost airlines and they don't support the concept of mixing flights between airlines in different alliances. This thesis proposes a mashup solution for the problem. A mashup application uses data from other resources (called content provider) to create a new application with new feature and functionality that is not offered by any of the content provider. In the flight search system that we build, the data are originated from airline websites. Extracting data from the Web has several obstacles such as password-protected sites, cookies, JavaScript, Session IDs, Web forms iterations, deep Web navigation, and dynamic changes on websites.<br />We use wrapper generation technology from Lixto to solve the obstacles and do the data extraction. Some data cleaning method is also performed to the wrapper output to clean unnecessary annotations.<br />In the algorithm, flight search problem is regarded as graph search problem with airports as the nodes and pair of airports where exist direct flights between them as the edges. We introduce hub identification heuristic concept to ensure system's scalability. Instead of analyzing and evaluating all possible routes to reach the destination, this heuristic only evaluates a fraction of all possible combinations that possibly contain the best routes.<br />