Csaba Szepesvari
Csaba Szepesvari
DeepMind & Department of Computing Science, University of Alberta
Bestätigte E-Mail-Adresse bei cs.ualberta.ca - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Bandit based monte-carlo planning
L Kocsis, C Szepesvári
European conference on machine learning, 282-293, 2006
27392006
Algorithms for Reinforcement Learning
C Szepesvari
Morgan and Claypool, 2010
9562010
Convergence results for single-step on-policy reinforcement-learning algorithms
S Singh, T Jaakkola, ML Littman, C Szepesvári
Machine learning 38 (3), 287-308, 2000
7002000
Improved algorithms for linear stochastic bandits
Y Abbasi-Yadkori, C Szepesvári, D Pál
Advances in Neural Information Processing Systems, 2312-2320, 2011
6042011
Online optimization in X-armed bandits
S Bubeck, G Stoltz, C Szepesvári, R Munos
Advances in Neural Information Processing Systems, 201-208, 2009
4902009
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits
JY Audibert, R Munos, C Szepesvári
Theoretical Computer Science 410 (19), 1876-1902, 2009
4802009
Fast gradient-descent methods for temporal-difference learning with linear function approximation
RS Sutton, HR Maei, D Precup, S Bhatnagar, D Silver, C Szepesvári, ...
Proceedings of the 26th Annual International Conference on Machine Learning …, 2009
4592009
Bandit algorithms
T Lattimore, C Szepesvári
Cambridge University Press, 2020
2872020
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
A Antos, C Szepesvári, R Munos
Machine Learning 71 (1), 89-129, 2008
2852008
Finite-time bounds for fitted value iteration
R Munos, C Szepesvári
Journal of Machine Learning Research 9 (May), 815-857, 2008
2742008
A generalized reinforcement-learning model: Convergence and applications
ML Littman, C Szepesvári
ICML 96, 310-318, 1996
2451996
Parametric bandits: The generalized linear case
S Filippi, O Cappe, A Garivier, C Szepesvári
Advances in Neural Information Processing Systems, 586-594, 2010
2332010
The grand challenge of computer Go: Monte Carlo tree search and extensions
S Gelly, L Kocsis, M Schoenauer, M Sebag, D Silver, C Szepesvári, ...
Communications of the ACM 55 (3), 106-113, 2012
2312012
Toward off-policy learning control with function approximation
HR Maei, C Szepesvári, S Bhatnagar, RS Sutton
ICML, 2010
2232010
Multi-criteria reinforcement learning.
Z Gábor, Z Kalmár, C Szepesvári
ICML 98, 197-205, 1998
2221998
Apprenticeship learning using inverse reinforcement learning and gradient methods
G Neu, C Szepesvári
arXiv preprint arXiv:1206.5264, 2012
2212012
Convergent temporal-difference learning with arbitrary smooth function approximation
S Bhatnagar, D Precup, D Silver, RS Sutton, HR Maei, C Szepesvári
Advances in Neural Information Processing Systems, 1204-1212, 2009
2042009
Empirical bernstein stopping
V Mnih, C Szepesvári, JY Audibert
Proceedings of the 25th international conference on Machine learning, 672-679, 2008
1992008
Improved rates for the stochastic continuum-armed bandit problem
P Auer, R Ortner, C Szepesvári
International Conference on Computational Learning Theory, 454-468, 2007
1802007
Fitted Q-iteration in continuous action-space MDPs
A Antos, C Szepesvári, R Munos
Advances in neural information processing systems, 9-16, 2008
1792008
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–20