|Title:||Sampling DNS traffic: a day in the life of the .at-zone||Other Titles:||Stichprobenziehung von DNS Traffic Daten||Language:||English||Authors:||Blatt, Andreas||Qualification level:||Diploma||Advisor:||Templ, Matthias||Issue Date:||2020||Number of Pages:||73||Qualification level:||Diploma||Abstract:||
This thesis investigates the added utility of statistical sampling to DNS network traffic analysis, specifically with regards to issues of long-term storage and computation latency. Using DNS log data for a full "day in the life of the Austrian Internet" provided by the Austrian domain registry operator nic.at, three emblematic sampling methods, namely simple random sampling, systematic sampling and stratified random sampling, are applied to a selection of network traffic features to assess their effectiveness in preserving the "true" population parameters. Confirming theoretical considerations and previous research into Internet traffic, it was found that due to the query arrival process being highly self-similar, and thus also autocorrelated, systematic sampling leads to very precise estimates particularly for time-based traffic characteristics. For network traffic features independent of time, all sampling procedures perform essentially the same. Furthermore, it was shown that for tasks not involving very rare phenomena or the estimation of the number of distinct client IP addresses, sampling provides an easy way for fast data exploration with estimates for (frequent) traffic patterns that are either practically identical to or less than 10% away from the true parameter (for patterns occurring at least on the same level as the sampling fraction) for the analysed features. Used in conjunction with current Big Data technology, these findings could lead to great gains in computation speeds and reduced storage requirements. The method that consistently performed best or virtually indistinguishable from the others was systematic sampling, with the added benefit of also being the computationally cheapest.
|Keywords:||DNS data; Sampling||URI:||https://doi.org/10.34726/hss.2021.80920
|DOI:||10.34726/hss.2021.80920||Library ID:||AC16172321||Organisation:||E105 - Institut für Stochastik und Wirtschaftsmathematik||Publication Type:||Thesis
|Appears in Collections:||Thesis|
Show full item record
Files in this item:
|Blatt Andreas - 2021 - Sampling DNS Traffic A Day in the Life of the at-Zone.pdf||2.95 MB||Adobe PDF|
checked on May 29, 2021
checked on May 29, 2021
Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.