Loading...
Please wait, while we are loading the content...
Multi-Gear Bandits, Partial Conservation Laws, and Indexability
| Content Provider | MDPI |
|---|---|
| Author | Jos, é Niño-Mora |
| Copyright Year | 2022 |
| Description | This paper considers what we propose to call multi-gear bandits, which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource consumption. The optimal operation of a multi-gear bandit aims to strike a balance between project performance costs or rewards and resource usage costs, which depend on the resource price. A computationally convenient and intuitive optimal solution is available when such a model is indexable, meaning that its optimal policies are characterized by a dynamic allocation index (DAI), a function of state–action pairs representing critical resource prices. Motivated by the lack of general indexability conditions and efficient index-computing schemes, and focusing on the infinite-horizon finite-state and -action discounted case, we present a verification theorem ensuring that, if a model satisfies two proposed PCL-indexability conditions with respect to a postulated family of structured policies, then it is indexable and such policies are optimal, with its DAI being given by a marginal productivity index computed by a downshift adaptive-greedy algorithm in |
| Starting Page | 2497 |
| e-ISSN | 22277390 |
| DOI | 10.3390/math10142497 |
| Journal | Mathematics |
| Issue Number | 14 |
| Volume Number | 10 |
| Language | English |
| Publisher | MDPI |
| Publisher Date | 2022-07-18 |
| Access Restriction | Open |
| Subject Keyword | Mathematics Operations Research and Management Science Markov Decision Process Multi-gear Bandits Index Policies Indexability Index Algorithm |
| Content Type | Text |
| Resource Type | Article |