Long-term workload monitoring : workload management on distributed OS/2 server systems

Strasser, Günther

Record link:

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-9198
http://hdl.handle.net/20.500.12708/13489

Title:

Long-term workload monitoring : workload management on distributed OS/2 server systems

Citation:

Strasser, G. (2000). Long-term workload monitoring : workload management on distributed OS/2 server systems [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-9198

CatalogPlus:

AC03025743

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Strasser, Günther

Organisational Unit:

E184 - Institut für Verteilte Systeme

Date (published):

2000

Number of Pages:

303

Keywords:

Verteiltes System; OS/2; Belastung; Langzeitverhalten; Monitoring

Abstract:

Over the last ten years the term 'systems management' has become increasingly popular in the area of client/server computing and distributed systems. As low-cost hardware and commercial software for the PC were populated all over the enterprise, IT managers saw the need to obtain control of the technology as well as the costs associated with it. Meanwhile powerful PC server machines have come to bear a good deal of the workload of the IT infrastructure. As with mainframe-based systems, therefore, capacity and workload management have become fundamental disciplines within systems management of client/server environments. While there are forty years of experience for mainframes, capacity management for distributed PC servers lacks both theory and software. Transactional and batch oriented mainframe software - which means the combination of operating system, middleware and application software - are equipped with a profound theory in modeling the processes and rich set of existing and proven tools for measuring and distributing a workload and for planning for the near future. Everything that exists in the client/server area is a direct extension of a similar method or tool from the traditional way of computing. Analysts find it hard to apply these on PC servers, however, mainly because PC operating systems origin from highly interactive personal (meaning individual) computing, which follows different rules. In addition, today's systems become so complex and consist of so many components that it is practically impossible to apply methods of traditional computing paradigms. This dissertation discusses the value and necessity of long-term monitoring and shows the many problems associated with that task. It explains how the problems can be solved and how one can benefit from the recorded information. The main focus is on SRVMONPM as the tool set implementing the method. The results of two case studies in real commercial environments are presented. Based on the case studies and the experiences from a number of people who applied the tool set in their work we conclude that SRVMONPM improves the controllability of distributed systems, but that there are still a number of problems that cannot be solved by a sole method or tool. The contribution of this dissertation is the introduction of a general method (implemented as a number of software tools) that enables an analyst to monitor and collect information about the dynamic nature of distributed PC-based server systems from an unlimited number of heterogeneous components, which are distributed over a (potentially large) number of server machines, over a long period of time. The information is compiled and recorded. In addition to common statistical methods, a special mechanism is supplied to detect correlation between any attributes that were recorded for the components. The method is based on the assumption that there is no knowledge about the details of interdependencies and relations between any of the systems involved. It can therefore be applied in different circumstances, and it is open for the inclusion of new information.

Im Laufe der letzten zehn Jahre hat der Begriff 'Systems Management' im Bereich verteilter Client/Server Systeme zunehmend Bedeutung erlangt. In dem Maße, in dem billige PC Hardware und die dazu verfügbare kommerzielle Software Einzug in die Datenverarbeitung der Unternehmen genommen hat, sahen die IT Manager die Notwendigkeit, diese Technologie und die damit verbundenen Probleme und Kosten unter Kontrolle zu bringen. Da nun leistungsfähige PC-Serverhardware einen nicht unwesentlichen Teil der Last in der EDV trägt, ist es notwendig, die wesentlichen System Managementdisziplinen, wie z.B. Kapazitäts- und Lastmanagement, analog den Methoden am Großrechner zu implementieren. Während es aber am Großrechner mehr als vierzig Jahre Erfahrung mit diesen Disziplinen gibt, fehlt es im Bereich verteilter PC Server z.B. im Bereich der Lastanalyse und Kapazitätsmanagement sowohl an der Theorie als auch verfügbaren Werkzeugen. Für die transaktions- bzw. batch-orientierte Software, wie sie meist auf Großrechnern läuft und die im Zusammenspiel des Betriebssystems, der Middleware-Komponenten und der Anwendungssoftware besteht, gibt es eine umfangreiche theoretische Grundlage, die zum Teil ihre Wurzeln in der Abbildung von Kommunikationskanälen hat, und eine Menge an erprobten Hilfsmitteln, die es ermöglichen, die Last an bestimmten Punkten zu messen und an andere Ressourcen zu verteilen. Alles, was es im PC-Bereich in diese Richtung gibt, stammt direkt vom Großrechner ab. In der Praxis hat sich jedoch gezeigt, daß die Anwendung dieser Methoden und Werkzeuge keine brauchbaren Ergebnisse liefert. Eine der Hauptgründe dafür liegt darin, daß die Betriebssysteme am PC - wir betrachten hier vor allem OS/2 - mit der Grundidee eines interaktiven und 'ungeordneten' Arbeitens mit einem Benutzer vollkommen anderen Gesetzen gehorchen als die sehr strikten Großrechner. Dazu kommt, daß verteilte Systeme inzwischen extrem kompliziert sind und aus so vielen Komponenten bestehen, daß es kaum möglich ist, hier 'manuelle' Methoden der Messung, Modellierung und Berechnung einzusetzen. Diese Dissertation zeigt den Wert und die Notwendigkeit der Langzeit-Lastüberwachung auf und beschreibt die wesentlichen Probleme, die damit verbunden ist. Es wird aufgezeigt, wie diese Probleme gelöst werden können, und welchen Nutzen man aus den gewonnenen Informationen ziehen kann. Das Hauptaugenmerk liegt auf dem Werkzeug SRVMONPM, das diese Erkenntnisse im Rahmen einer Reihe von Programmen implementiert. Die Ergebnisse zweier Fallstudien werden grafisch aufbereitet dargestellt. Auf Grund der Erkenntnisse aus den beiden Studien und anderen Erfahrungen, die im Rahmen der Arbeiten mit dem Werkzeug gewonnen wurden, kommen wir zu dem Schluß, daß ein solches Werkzeug die Kontrollierbarkeit und Steuerbarkeit eines komplexen, verteilten Systems verbessert, daß es aber eine Reihe von Problemen gibt, die nicht einfach durch ein Werkzeug gelöst werden können. Der wesentliche Beitrag dieser Dissertation besteht im Entwurf und der Umsetzung einer Methodik zur Erfassung und Analyse umfangreicher Systemparameter aus verteilten Systemen über einen langen Zeitraum hinweg, die es ermöglicht, solche Systeme, die aus sehr vielen heterogenen Teilen bestehen, in ihrer Gesamtheit zu erfassen und zu durchleuchten. Neben statistischen Standardauswertungen wurde ein spezieller Algorithmus entwickelt, der es ermöglicht, ohne Kenntnisse oder Annahmen über das beobachtete System Korrelationen zwischen beliebigen Systemparametern zu finden. Dies ist deshalb wichtig, weil es auf Grund fehlenden Wissens bzw. Information praktisch unmöglich ist, für ein reales System die notwendigen Voraussetzungen klassischer Methoden zu erfüllen. Die daraus resultierende Methode arbeitet frei von Annahmen und ist auch für die Einbindung jeglicher weiterer Systeminformationen offen.

License:

In Copyright

Appears in Collections:

Thesis