reinforcement learning; power system control; electrical power oscillations damping; TCSC control; approximate value iteration
Abstract :
[en] In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.
R. Bellman. Dynamic Programming. Princeton University Press, 1957.
R. Bellman, R. Kalaba, and B. Kotkin. Polynomial approximation-A new computational technique in dynamic programming: Allocation processes. Mathematical Computation, 17:155-161, 1973.
K. H. Chan, L. Jiang, P. Tilloston, and Q. H. Wu. Reinforcement learning for the control of large-scale power systems. In Proceedings of EIS'2000, Paisley, UK, 2000.
A. Diu and L. Wehenkel. EXaMINE-Experimentation of a monitoring and control system for managing vulnerabilities of the European infrastructure for electrical power exchange. In Proceedings of the IEEE PES Summer Meeting, panel session on power system security in the new market environment, Chicago, USA, 2002.
D. Ernst. Near Optimal Closed-Loop Control. Application to Electric Power Systems. PhD thesis, University of Liége, Belgium, March 2003.
D. Ernst. Selecting concise sets of samples for a reinforcement learning agent. Submitted, 2005.
D. Ernst, P. Geurts, and L. Wehenkel. Iteratively extending time horizon reinforcement learning. In N. Lavra, L. Gamberger, and L. Todorovski, editors, Proceedings of the 14th European Conference on Machine Learning, pages 96-107, Dubrovnik, Croatia, September 2003. Springer-Verlag Heidelberg.
D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, April 2005.
D. Ernst, M. Glavic, and L. Wehenkel. Power system stability control: reinforcement learning framework. IEEE Transactions on Power Systems, 19:427-435, February 2004.
D. Ernst and L. Wehenkel. FACTS devices controlled by means of reinforcement learning algorithms. In Proceedings of the 14th Power System Computation Conference, Sevilla, Spain, June 2002.
P. Geurts. Contributions to Decision Tree Induction: Bias/Variance Tradeoff and Time Series Classification. PhD thesis, University of Liége, Belgium, May 2002.
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Submitted, 2004.
M. Ghandhari. Control Lyapunov Functions: A Control Strategy for Damping of Power Oscillations in Large Power Systems. PhD thesis, Royal Instituate of Technology, Dept. of Electric Power Engineering, Electric Power Systems, 2000.
M. Glavic, D. Ernst, and L. Wehenkel. Combining a stability and a performance oriented control in power systems. IEEE Transactions on Power Systems, 20(1):525-525, 2005.
G.J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, June 1999.
O. Herńandez-Lerma and B. Lasserre. Discrete-Time Markov Control Processes. Springer, New-York, 1996.
N.G. Hingorani and L. Gyugyi. Understanding FACTS. IEEE press, 2000.
J. Jung, C.C. Liu, S.L. Tanimoto, and V. Vittal. Adaptation in load shedding under vulnerable operating conditions. IEEE Transactions on Power Systems, 17(4):1199-1205, 2002.
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 1998.
L.P. Kaelbling, M.L. Littman, and A.W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285, 1996.
P. Kundur. Power System Stability and Control. McGraw-Hill, 1994.
B. H. Li and Q. H. Wu. Learning coordinated fuzzy logic control of dynamic quadrature boosters in multimachine power systems. IEE Part C-Generation, Transmission, and Distribution, 146(6):577-585, 1999.
C.C. Liu, J. Jung, G.T. Heydt, and V. Vittal. The strategic power infrastructure defense (SPID) system. IEEE Control System Magazine, 20: 40-52, 2000.
A.W. Moore and C.G. Atkeson. Prioritized sweeping: reinforcement learning with less data and less real time. Machine Learning, 13:103-130, 1993.
D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average-cost problems. IEEE Transactions on Automatic Control, 47 (10):1624-1636, 2002.
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3):161-178, 2002.
M. Pavella and P.G.Murthy. Transient Stability of Power Systems: Theory and Practice. John Wiley & Sons, 1994.
G. Rogers. Power System Oscillations. Kluwer Academic Publishers, 2000.
J. Rust. Using randomization to break the curse of dimensionality. Econometrica, 65(3):487-516, 1997.
S. Singh and D. Bertsekas. Reinforcement learning for dynamical channel allocation in cellular telephone systems. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, pages 974-980. The MIT Press, 1997.
R.S. Sutton. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216-224, San Mateo, CA, 1990. Morgan Kaufmann.
R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12:1057-1063, 2000.
C.W. Taylor. Response-based, feedforward wide-Area control. Position paper for NSF/DOE/EPRI Sponsored Workshop on Future Research Directions for Complex Interactive Networks, Washington DC, USA, 16-17 November 2000., 2000.
J.N. Tsitsiklis and B. Van Roy. Feature-based methods for large-scale dynamic programming. Machine Learning, 22:59-94, 1996.
J.N. Tsitsiklis and B. Van Roy. Actor-critic algorithms. Advances in Neural Information Processing Systems, 12, 2000.
G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch. Implementation of adaptive critic-based neurocontrollers for turbogenerators in a multimachine power system. IEEE Transactions on Neural Networks, 14(5): 1047-1064, 2003.
L. Wehenkel. Emergency control and its strategies. In Proceedings of the 13-Th PSCC, pages 35-48, Trondheim, Norway, 1999.
L. Wehenkel, M. Glavic, and D. Ernst. New developments in the application of automatic learning to power system control. In Proceedings of the 15th Power System Computation Conference, 2005.