Loading...
Please wait, while we are loading the content...
Similar Documents
Evaluating software agents using human benchmarks.
| Content Provider | CiteSeerX |
|---|---|
| Author | Grant, Robert D. Perry, Dewayne E. |
| Abstract | common in the engineering of software systems. In this work, we explore using human subjects to create benchmarks for evaluating these agents. In our case studies, we address the domain of instructable software agents as proposed by the Bootstrapped Learning project [1]. Aim: Our aim is to define and refine requirements, problem solving strategies, and evaluation methodologies for software agents, paving the way for rigorous experiments comparing their performance with human benchmarks. Method: Little was known about what factors would be critical, so our empirical approach is exploratory case studies. In two studies covering three distinct groups, we use human subjects to develop an evaluation curriculum for instructable software agents, collecting quantitative data through online quizzes and tests and qualitative data through observation. Results: Though we provide some analysis of quantitative data, our most important results are qualitative. We uncover and address several intrinsic challenges in comparing software agents with humans, including the greater semantic understanding of humans, the eidetic memory of software agents, and the importance of various study parameters (including timing issues and lesson complexity) to human performance. Conclusions: This work provides valuable insight into evaluating software agents with human benchmarks. We hope future researchers will be able to perform controlled experiments in various domains using a methodology based on the results of our case studies. I. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Software Agent Software Agent Using Human Benchmark Human Benchmark Quantitative Data Case Study Human Subject Instructable Software Agent Future Researcher Online Quiz Qualitative Data Eidetic Memory Lesson Complexity Human Performance Valuable Insight Address Several Intrinsic Challenge Evaluation Curriculum Various Domain Semantic Understanding Important Result Exploratory Case Study Empirical Approach Software System Various Study Parameter Rigorous Experiment Evaluation Methodology Bootstrapped Learning Project Controlled Experiment Distinct Group |
| Content Type | Text |