Blatt, A. (2020). Sampling DNS traffic: a day in the life of the .at-zone [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.80920
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2020
-
Number of Pages:
73
-
Keywords:
DNS data; Sampling
en
Abstract:
This thesis investigates the added utility of statistical sampling to DNS network traffic analysis, specifically with regards to issues of long-term storage and computation latency. Using DNS log data for a full "day in the life of the Austrian Internet" provided by the Austrian domain registry operator nic.at, three emblematic sampling methods, namely simple random sampling, systematic sampling and stratified random sampling, are applied to a selection of network traffic features to assess their effectiveness in preserving the "true" population parameters. Confirming theoretical considerations and previous research into Internet traffic, it was found that due to the query arrival process being highly self-similar, and thus also autocorrelated, systematic sampling leads to very precise estimates particularly for time-based traffic characteristics. For network traffic features independent of time, all sampling procedures perform essentially the same. Furthermore, it was shown that for tasks not involving very rare phenomena or the estimation of the number of distinct client IP addresses, sampling provides an easy way for fast data exploration with estimates for (frequent) traffic patterns that are either practically identical to or less than 10% away from the true parameter (for patterns occurring at least on the same level as the sampling fraction) for the analysed features. Used in conjunction with current Big Data technology, these findings could lead to great gains in computation speeds and reduced storage requirements. The method that consistently performed best or virtually indistinguishable from the others was systematic sampling, with the added benefit of also being the computationally cheapest.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers