Title: Sampling DNS traffic: a day in the life of the .at-zone
Other Titles: Stichprobenziehung von DNS Traffic Daten
Language: English
Authors: Blatt, Andreas 
Qualification level: Diploma
Advisor: Templ, Matthias 
Issue Date: 2020
Number of Pages: 73
Qualification level: Diploma
This thesis investigates the added utility of statistical sampling to DNS network traffic analysis, specifically with regards to issues of long-term storage and computation latency. Using DNS log data for a full "day in the life of the Austrian Internet" provided by the Austrian domain registry operator nic.at, three emblematic sampling methods, namely simple random sampling, systematic sampling and stratified random sampling, are applied to a selection of network traffic features to assess their effectiveness in preserving the "true" population parameters. Confirming theoretical considerations and previous research into Internet traffic, it was found that due to the query arrival process being highly self-similar, and thus also autocorrelated, systematic sampling leads to very precise estimates particularly for time-based traffic characteristics. For network traffic features independent of time, all sampling procedures perform essentially the same. Furthermore, it was shown that for tasks not involving very rare phenomena or the estimation of the number of distinct client IP addresses, sampling provides an easy way for fast data exploration with estimates for (frequent) traffic patterns that are either practically identical to or less than 10% away from the true parameter (for patterns occurring at least on the same level as the sampling fraction) for the analysed features. Used in conjunction with current Big Data technology, these findings could lead to great gains in computation speeds and reduced storage requirements. The method that consistently performed best or virtually indistinguishable from the others was systematic sampling, with the added benefit of also being the computationally cheapest.
Keywords: DNS data; Sampling
URI: https://doi.org/10.34726/hss.2021.80920
DOI: 10.34726/hss.2021.80920
Library ID: AC16172321
Organisation: E105 - Institut für Stochastik und Wirtschaftsmathematik 
Publication Type: Thesis
Appears in Collections:Thesis

Files in this item:

Show full item record

Page view(s)

checked on May 29, 2021


checked on May 29, 2021

Google ScholarTM


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.