Loading...
Please wait, while we are loading the content...
Similar Documents
Scalable reinforcement learning on Cray XC
| Content Provider | Semantic Scholar |
|---|---|
| Author | Kommaraju, Ananda Varadhan Maschhoff, Kristyn J. Ringenburg, Michael F. Robbins, Benjamin |
| Copyright Year | 2019 |
| Abstract | Recent advancements in deep learning have made reinforcement learning (RL) applicable to a much broader range of decision making problems. However, the emergence of reinforcement learn workloads brings multiple challenges to system resource management. RL applications continuously train a deep learning or a machine learning model while interacting with uncertain simulation models. This new generation of AI applications imposes significant demands on system resources such as memory, storage, network, and compute. In this paper, we describe a typical RL application workflow, and introduce the Ray distributed execution framework developed at the UC Berkeley RISELab. Ray includes the RLlib library for executing distributed reinforcement learning applications. We describe a recipe for deploying the Ray execution framework on Cray XC systems, and demonstrate scaling of RLLib algorithms across multiple nodes of the system. We also explore performance characteristics across multiple CPU and GPU node types. |
| File Format | PDF HTM / HTML |
| DOI | 10.1002/cpe.5636 |
| Alternate Webpage(s) | https://cug.org/proceedings/cug2019_proceedings/includes/files/pap108s2-file1.pdf |
| Alternate Webpage(s) | https://doi.org/10.1002/cpe.5636 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |