NDLI: A sensitivity study of the clustering approach to workload modeling (extended abstract)

Please wait, while we are loading the content...

A sensitivity study of the clustering approach to workload modeling (extended abstract)

Content Provider	ACM Digital Library
Author	Ferrari, Domenico Calzarossa, Maria
Abstract	In a paper published in 1984 [Ferr84], the validity of applying clustering techniques to the design of an executable model for an interactive workload was discussed. The following assumptions, intended not to be necessarily realistic but to provide sufficient conditions for the applicability of clustering techniques, were made:The system whose workload is to be modeled is an interactive system, and its performance can be accurately evaluated by solving a product-form closed queueing network model.The behavior of each interactive user can be adequately modeled by a probabilistic graph (called a user behavior graph); in such a graph, each node represents an interactive command type, and the duration of a user's stay in the node probabilistically equals the time the user spends typing in a command of that type, waiting for the system's response, and thinking about what command should be input next.The interactive workload to be modeled is stationary, and the workload model to be constructed is intended to reproduce its global characteristics (not those of some brief excerpt from it exhibiting peculiar dynamics), hence to be stationary as well.It was shown in [Ferr84] that, under these assumptions, clustering command types having the same probabilistic resource demands does not affect the values of the performance indices the evaluators are usually interested in, provided the visit ratio to each node in the reduced (i.e., post-clustering) user behavior graph is equal to the sum of the visit ratios the cluster's components had in the original graph.Since the reduction we have just described is equivalent to replacing each cluster with one or more representatives of its components, and since this is also the goal of applying clustering techniques to the construction of executable workload models substantially more compact than the original workload to be modeled, this result shows that such techniques are valid (i.e., produce accurate models) when the assumptions and the conditions mentioned above are satisfied.One condition which in practice is never satisfied, however, is that the clustered commands are characterized by exactly the same resource demands. In fact, clustering algorithms are non-trivial just because they have to recognize “nearness” among commands with different characteristics, and group those and only those commands whose resource demands are sufficiently similar (where the notion of similarity is to be defined by introducing that of distance between two commands). Thus, the question of the sensitivity of a workload model's accuracy to the inevitable dispersion of the characteristics of a cluster's components immediately arises. We know that, if an adequate product-form model of an interactive system can be built, if the users' behaviors can be accurately modeled by probabilistic graphs, and if the workload and the model of it to be constructed are stationary, then a workload model in which all commands with identical characteristics are grouped together and modeled by a single representative is an accurate model of the given workload (i.e., the model produces the same values of the performance indices of interest as the modeled workload when it is processed by a given system). This is true, of course, provided the visit ratios of the workload model's components equal the sums of those of the corresponding workload components. If we now apply a clustering algorithm to the given workload, thereby obtaining clusters of similar, but not identical, commands, and we build a workload model by assembling cluster representatives (usually one per cluster, for instance with demands corresponding to those of the cluster's center of mass), by how much will the values of the performance indices produced by the workload model running on the given system differ from those produced by the workload to be modeled?As with several other problems, this could be attacked by a mathematical approach or by an experimental one. While a successful mathematical analysis of the sensitivity of the major indices to the dispersion in the resource demands of the commands being clustered together would provide more general results, it would also be likely to require the introduction of simplifying assumptions (for example, having to do with the distributions of the resource demands in a cluster around its center of mass) whose validity would be neither self-evident nor easy to verify experimentally.On the other hand, an experimental approach achieves results which, strictly speaking, are only applicable to the cases considered in the experiments. Extrapolations to other systems, other workloads, other environments usually require faith, along with experience, common sense, and familiarity with real systems and workloads. This inherent lack of generality is somehow counterbalanced, however, by the higher degree of realism that is achievable with an experimental investigation. In particular, when in a study the properties of workloads are to play a crucial role (there are very few studies indeed in which this is not the case!), using a mathematical approach is bound to raise about such properties questions that are either very difficult or impossible to answer. Primarily for this reason, and knowing very well the limitations in the applicability of the results we would obtain, we decided to adopt an experimental approach.Since the question we were confronted with had never been answered before (nor, to our knowledge, had it been asked), we felt that our choice was justified by the exploratory nature of the study. If the resulting sensitivity were to turn out to be high, we could conclude that not even under the above assumptions can clustering techniques be trusted to provide reasonable accuracy in all cases and hence should not be used, or used with caution in those cases (if they exist) in which their accuracy might be accept able. If, on the other hand, the sensitivity were low, then we could say that, in at least one practical case, clustering techniques would have been shown to work adequately (of course, under all the other assumptions listed above).The rationale of this investigation might be questioned by asking why it would not be more convenient to test the validity of clustering techniques directly, that is, by comparing the performance indices produced by a real workload to those produced by an executable model (artificial workload) built according to a clustering technique. Our answer is that, in this study as well as in [Ferr84], we are more interested in understanding the limitations and the implications of clustering and other workload model design methods than in evaluating the accuracy of clustering in a particular case. In other words, we are not so much keen on finding out whether the errors due to clustering are of the order of 10% or of 80%, but we want to be able to understand why they are only 10% or as large as 80%, respectively. Thus, we need to decompose the total error into the contributions to it of the various discrepancies that any real situation exhibits with respect to the ideal one. This paper describes a study primarily performed to assess the magnitude of one such contribution, that of the dispersion of the resource demands of clustered commands.An experimental approach, in the ease being considered here, requires first of all that a workload for the experiment be selected. Then, that workload is to be measured, in order to obtain the values of the parameters defined by the desired characterization.Next, an executable workload model is to be built by applying a clustering technique to the real workload selected. Then, the workload and its model are to be run on the same system, so that the model's accuracy can be evaluated by comparing the performance indices produced by them. As our study is to try to isolate the sensitivity of that accuracy to the differences in demands among the commands that have been grouped into the same cluster, these differences must be made the only source of inaccuracies in the performance produced by the model. To isolate this contribution to the error from all of the others, the latter sources should be eliminated. Finally, the experiment is to be carried out, and its results interpreted. The results show that, on the whole, the clustering method for workload model design is reasonably accurate in the context of the case examined in our study. The sensitivities we found were reasonably low. Thus, we can state that, in at least one practical case and under the assumptions discussed in this paper, clustering techniques for executable workload model design have been shown to work well.
Starting Page	38
Ending Page	39
Page Count	2
File Format	PDF
ISSN	01635999
DOI	10.1145/317786.317808
Journal	ACM SIGMETRICS Performance Evaluation Review (PERV)
Volume Number	13
Issue Number	2
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2014-01-10
Publisher Place	New York
Access Restriction	One Nation One Subscription (ONOS)
Content Type	Text
Resource Type	Article
Subject	Computer Networks and Communications Hardware and Architecture Software

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Toward workload characterization of video server and digital library applications (extended abstract)

A sensitivity study of the clustering approach to workload modeling (extended abstract)

Temporally determinate disk access (extended abstract):an experimental approach

Profiling the X protocol (extended abstract)

On the sensitivity of cooperative caching performance to workload and network characteristics

Dynamic Coordination Mechanisms:[Extended Abstract]

A statistical approach to computer performance modeling

Workload modeling for highly interactive applications

Kernel-based memory simulation (extended abstract)

A sensitivity study of the clustering approach to workload modeling (extended abstract)

Similar Documents

Toward workload characterization of video server and digital library applications (extended abstract)

A sensitivity study of the clustering approach to workload modeling (extended abstract)

Temporally determinate disk access (extended abstract):an experimental approach

Profiling the X protocol (extended abstract)

On the sensitivity of cooperative caching performance to workload and network characteristics

Dynamic Coordination Mechanisms:[Extended Abstract]

A statistical approach to computer performance modeling

Workload modeling for highly interactive applications

Kernel-based memory simulation (extended abstract)

A sensitivity study of the clustering approach to workload modeling (extended abstract)