Please wait, while we are loading the content...
Please wait, while we are loading the content...
| Content Provider | ACM Digital Library |
|---|---|
| Author | Ferrari, Domenico Calzarossa, Maria |
| Abstract | In a paper published in 1984 [Ferr84], the validity of applying clustering techniques to the design of an executable model for an interactive workload was discussed. The following assumptions, intended not to be necessarily realistic but to provide sufficient conditions for the applicability of clustering techniques, were made:The system whose workload is to be modeled is an interactive system, and its performance can be accurately evaluated by solving a product-form closed queueing network model.The behavior of each interactive user can be adequately modeled by a probabilistic graph (called a user behavior graph); in such a graph, each node represents an interactive command type, and the duration of a user's stay in the node probabilistically equals the time the user spends typing in a command of that type, waiting for the system's response, and thinking about what command should be input next.The interactive workload to be modeled is stationary, and the workload model to be constructed is intended to reproduce its global characteristics (not those of some brief excerpt from it exhibiting peculiar dynamics), hence to be stationary as well.It was shown in [Ferr84] that, under these assumptions, clustering command types having the same probabilistic resource demands does not affect the values of the performance indices the evaluators are usually interested in, provided the visit ratio to each node in the reduced (i.e., post-clustering) user behavior graph is equal to the sum of the visit ratios the cluster's components had in the original graph.Since the reduction we have just described is equivalent to replacing each cluster with one or more representatives of its components, and since this is also the goal of applying clustering techniques to the construction of executable workload models substantially more compact than the original workload to be modeled, this result shows that such techniques are valid (i.e., produce accurate models) when the assumptions and the conditions mentioned above are satisfied.One condition which in practice is never satisfied, however, is that the clustered commands are characterized by exactly the same resource demands. In fact, clustering algorithms are non-trivial just because they have to recognize “nearness” among commands with different characteristics, and group those and only those commands whose resource demands are sufficiently similar (where the notion of similarity is to be defined by introducing that of distance between two commands). Thus, the question of the sensitivity of a workload model's accuracy to the inevitable dispersion of the characteristics of a cluster's components immediately arises. We know that, if an adequate product-form model of an interactive system can be built, if the users' behaviors can be accurately modeled by probabilistic graphs, and if the workload and the model of it to be constructed are stationary, then a workload model in which all commands with identical characteristics are grouped together and modeled by a single representative is an accurate model of the given workload (i.e., the model produces the same values of the performance indices of interest as the modeled workload when it is processed by a given system). This is true, of course, provided the visit ratios of the workload model's components equal the sums of those of the corresponding workload components. If we now apply a clustering algorithm to the given workload, thereby obtaining clusters of similar, but not identical, commands, and we build a workload model by assembling cluster representatives (usually one per cluster, for instance with demands corresponding to those of the cluster's center of mass), by how much will the values of the performance indices produced by the workload model running on the given system differ from those produced by the workload to be modeled?As with several other problems, this could be attacked by a mathematical approach or by an experimental one. While a successful mathematical analysis of the sensitivity of the major indices to the dispersion in the resource demands of the commands being clustered together would provide more general results, it would also be likely to require the introduction of simplifying assumptions (for example, having to do with the distributions of the resource demands in a cluster around its center of mass) whose validity would be neither self-evident nor easy to verify experimentally.On the other hand, an experimental approach achieves results which, strictly speaking, are only applicable to the cases considered in the experiments. Extrapolations to other systems, other workloads, other environments usually require faith, along with experience, common sense, and familiarity with real systems and workloads. This inherent lack of generality is somehow counterbalanced, however, by the higher degree of realism that is achievable with an experimental investigation. In particular, when in a study the properties of workloads are to play a crucial role (there are very few studies indeed in which this is not the case!), using a mathematical approach is bound to raise about such properties questions that are either very difficult or impossible to answer. Primarily for this reason, and knowing very well the limitations in the applicability of the results we would obtain, we decided to adopt an experimental approach.Since the question we were confronted with had never been answered before (nor, to our knowledge, had it been asked), we felt that our choice was justified by the exploratory nature of the study. If the resulting sensitivity were to turn out to be high, we could conclude that not even under the above assumptions can clustering techniques be trusted to provide reasonable accuracy in all cases and hence should not be used, or used with caution in those cases (if they exist) in which their accuracy might be accept able. If, on the other hand, the sensitivity were low, then we could say that, in at least one practical case, clustering techniques would have been shown to work adequately (of course, under all the other assumptions listed above).The rationale of this investigation might be questioned by asking why it would not be more convenient to test the validity of clustering techniques directly, that is, by comparing the performance indices produced by a real workload to those produced by an executable model (artificial workload) built according to a clustering technique. Our answer is that, in this study as well as in [Ferr84], we are more interested in understanding the limitations and the implications of clustering and other workload model design methods than in evaluating the accuracy of clustering in a particular case. In other words, we are not so much keen on finding out whether the errors due to clustering are of the order of 10% or of 80%, but we want to be able to understand why they are only 10% or as large as 80%, respectively. Thus, we need to decompose the total error into the contributions to it of the various discrepancies that any real situation exhibits with respect to the ideal one. This paper describes a study primarily performed to assess the magnitude of one such contribution, that of the dispersion of the resource demands of clustered commands.An experimental approach, in the ease being considered here, requires first of all that a workload for the experiment be selected. Then, that workload is to be measured, in order to obtain the values of the parameters defined by the desired characterization.Next, an executable workload model is to be built by applying a clustering technique to the real workload selected. Then, the workload and its model are to be run on the same system, so that the model's accuracy can be evaluated by comparing the performance indices produced by them. As our study is to try to isolate the sensitivity of that accuracy to the differences in demands among the commands that have been grouped into the same cluster, these differences must be made the only source of inaccuracies in the performance produced by the model. To isolate this contribution to the error from all of the others, the latter sources should be eliminated. Finally, the experiment is to be carried out, and its results interpreted. The results show that, on the whole, the clustering method for workload model design is reasonably accurate in the context of the case examined in our study. The sensitivities we found were reasonably low. Thus, we can state that, in at least one practical case and under the assumptions discussed in this paper, clustering techniques for executable workload model design have been shown to work well. |
| Starting Page | 38 |
| Ending Page | 39 |
| Page Count | 2 |
| File Format | |
| ISSN | 01635999 |
| DOI | 10.1145/317786.317808 |
| Journal | ACM SIGMETRICS Performance Evaluation Review (PERV) |
| Volume Number | 13 |
| Issue Number | 2 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2014-01-10 |
| Publisher Place | New York |
| Access Restriction | One Nation One Subscription (ONOS) |
| Content Type | Text |
| Resource Type | Article |
| Subject | Computer Networks and Communications Hardware and Architecture Software |
National Digital Library of India (NDLI) is a virtual repository of learning resources which is not just a repository with search/browse facilities but provides a host of services for the learner community. It is sponsored and mentored by Ministry of Education, Government of India, through its National Mission on Education through Information and Communication Technology (NMEICT). Filtered and federated searching is employed to facilitate focused searching so that learners can find the right resource with least effort and in minimum time. NDLI provides user group-specific services such as Examination Preparatory for School and College students and job aspirants. Services for Researchers and general learners are also provided. NDLI is designed to hold content of any language and provides interface support for 10 most widely used Indian languages. It is built to provide support for all academic levels including researchers and life-long learners, all disciplines, all popular forms of access devices and differently-abled learners. It is designed to enable people to learn and prepare from best practices from all over the world and to facilitate researchers to perform inter-linked exploration from multiple sources. It is developed, operated and maintained from Indian Institute of Technology Kharagpur.
Learn more about this project from here.
NDLI is a conglomeration of freely available or institutionally contributed or donated or publisher managed contents. Almost all these contents are hosted and accessed from respective sources. The responsibility for authenticity, relevance, completeness, accuracy, reliability and suitability of these contents rests with the respective organization and NDLI has no responsibility or liability for these. Every effort is made to keep the NDLI portal up and running smoothly unless there are some unavoidable technical issues.
Ministry of Education, through its National Mission on Education through Information and Communication Technology (NMEICT), has sponsored and funded the National Digital Library of India (NDLI) project.
| Sl. | Authority | Responsibilities | Communication Details |
|---|---|---|---|
| 1 | Ministry of Education (GoI), Department of Higher Education |
Sanctioning Authority | https://www.education.gov.in/ict-initiatives |
| 2 | Indian Institute of Technology Kharagpur | Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project | https://www.iitkgp.ac.in |
| 3 | National Digital Library of India Office, Indian Institute of Technology Kharagpur | The administrative and infrastructural headquarters of the project | Dr. B. Sutradhar bsutra@ndl.gov.in |
| 4 | Project PI / Joint PI | Principal Investigator and Joint Principal Investigators of the project |
Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon |
| 5 | Website/Portal (Helpdesk) | Queries regarding NDLI and its services | support@ndl.gov.in |
| 6 | Contents and Copyright Issues | Queries related to content curation and copyright issues | content@ndl.gov.in |
| 7 | National Digital Library of India Club (NDLI Club) | Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach | clubsupport@ndl.gov.in |
| 8 | Digital Preservation Centre (DPC) | Assistance with digitizing and archiving copyright-free printed books | dpc@ndl.gov.in |
| 9 | IDR Setup or Support | Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops | idr@ndl.gov.in |
|
Loading...
|