The Convergence of TD(λ) for General λ

Dayan, P

doi:10.1023/A:1022632907294

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

The Convergence of TD(λ) for General λ

MPG-Autoren

Es sind keine MPG-Autoren in der Publikation vorhanden

Externe Ressourcen

https://link.springer.com/content/pdf/10.1023%2FA%3A1022632907294.pdf
(Verlagsversion)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Dayan, P. (1992). The Convergence of TD(λ) for General λ. Machine Learning, 8(3-4), 341-362. doi:10.1023/A:1022632907294.

Zitierlink: https://hdl.handle.net/21.11116/0000-0002-D743-0

Zusammenfassung

The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.

It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q
-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.