Optimal planning under uncertainty is a challenging problem facing practitioners dealing with real-world domain. MDPs facilitate a mathematical framework for solving these problem, but unfortunately for realistic multi-agent planning, the size of the state space is exponential in the number of agents and researchers quickly realized the limitation of tabular representations and introduced approximations to cope with the complexity and boost the learning speed through generalization. Finding the right representation is a critical milestone to scale the existing MDP solvers to larger domains.
Due to their ease of use, theoretical analytical, and empirical results, the linear family of function approximators have been favored in the literature. In this setting, the target function is approximated as the linear combination of a set of feature vectors. Many approaches try to find the right set of features offline, but online methods enjoy the
advantage of adaptability to dynamic environment and often the case lower computational complexity. Hence these methods have improved the learning speed of existing reinforcement learning (RL) algorithms in
low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems.
Our research has explored the conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize
poorly. Hence, we introduced incremental Feature Dependency
Discovery (iFDD) as a computationally-inexpensive method for
representational expansion that can be combined with any online,
value-based RL method that uses binary features. Unlike other online
expansion techniques, iFDD creates new features in low dimensional
subspaces of the full state space where the approximation error persist. We provided convergence and
computational complexity guarantees for iFDD.
The usability of our approach in large state space such as UAV mission planning was confirmed by comparing the effectiveness of iFDD with Sarsa (green)
against representations that
The above figure shows the first empirical domain, in which the goal is to provide surveillance for two targets using three fuel limited UAVs. Moreover, the task requires at least one UAV to remain in the communication node to rely back the information to the base. Notice how iFDD could obtain much more cumulative reward (Y-axis) as it observed more interaction steps (X-axis).
Above Figure depicts a rescue mission in which a heterogeneous team of UAVs plan to rescue as many people as possible highlighted as positive numbers close to each node. To carry out the rescue at a particular node, the medic UAV should visit the node while the communication UAV is no further than one edge from that node in order to provide satellite information. For nodes with stochastic rescue outcomes, the numbers inside clouds represent the success rate. While the initial representation performed well initiall, applying iFDD roughly doubled the number of survivers. Both ATC and SDM methods led the agent to believe that sending out UAVs is dangerous and should be avoided, hence the zero performance.
Planning space for both of these domains exceeds hundred million possibilities. Adding useful features, iFDD allowed the agent to learn the task substantially better than the other methods.
A. Geramifard, F. Doshi, J. Redding, N. Roy, J. P. How, "incremental Feature Dependency Discovery", Proceedings of the 23rd International Conference on Machine Learning (ICML), 2011 [BibTeX][PDF]