Concrete problems in AI safety D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané arXiv preprint arXiv:1606.06565, 2016 | 1885 | 2016 |
Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv e-prints, arXiv: 1605.02688, 2016 | 1049* | 2016 |
Deep reinforcement learning from human preferences PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei Advances in neural information processing systems 30, 2017 | 845 | 2017 |
Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs P Christiano, JA Kelner, A Madry, DA Spielman, SH Teng Proceedings of the forty-third annual ACM symposium on Theory of computing …, 2011 | 369 | 2011 |
Training language models to follow instructions with human feedback L Ouyang, J Wu, X Jiang, D Almeida, CL Wainwright, P Mishkin, C Zhang, ... arXiv preprint arXiv:2203.02155, 2022 | 360 | 2022 |
A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models C Finn, P Christiano, P Abbeel, S Levine arXiv preprint arXiv:1611.03852, 2016 | 331 | 2016 |
Fine-tuning language models from human preferences DM Ziegler, N Stiennon, J Wu, TB Brown, A Radford, D Amodei, ... arXiv preprint arXiv:1909.08593, 2019 | 217 | 2019 |
Transfer from simulation to real world through learning deep inverse dynamics model P Christiano, Z Shah, I Mordatch, J Schneider, T Blackwell, J Tobin, ... arXiv preprint arXiv:1610.03518, 2016 | 213 | 2016 |
Learning to summarize with human feedback N Stiennon, L Ouyang, J Wu, D Ziegler, R Lowe, C Voss, A Radford, ... Advances in Neural Information Processing Systems 33, 3008-3021, 2020 | 210 | 2020 |
Quantum money from hidden subspaces S Aaronson, P Christiano Proceedings of the forty-fourth annual ACM symposium on Theory of computing …, 2012 | 155 | 2012 |
A cryptographic test of quantumness and certifiable randomness from a single quantum device Z Brakerski, P Christiano, U Mahadev, U Vazirani, T Vidick Journal of the ACM (JACM) 68 (5), 1-47, 2021 | 113 | 2021 |
Unrestricted adversarial examples TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow arXiv preprint arXiv:1809.08352, 2018 | 72 | 2018 |
AI safety via debate G Irving, P Christiano, D Amodei arXiv preprint arXiv:1805.00899, 2018 | 72 | 2018 |
Recursively summarizing books with human feedback J Wu, L Ouyang, DM Ziegler, N Stiennon, R Lowe, J Leike, P Christiano arXiv preprint arXiv:2109.10862, 2021 | 59 | 2021 |
Supervising strong learners by amplifying weak experts P Christiano, B Shlegeris, D Amodei arXiv preprint arXiv:1810.08575, 2018 | 43 | 2018 |
Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic M Barasz, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, ... arXiv preprint arXiv:1401.5577, 2014 | 41* | 2014 |
Online local learning via semidefinite programming P Christiano Proceedings of the forty-sixth annual ACM symposium on Theory of computing …, 2014 | 16 | 2014 |
Non-omniscience, probabilistic inference, and metamathematics P Christiano Machine Intelligence Research Institute, Berkeley, CA, June, 2014 | 14* | 2014 |
Reflective oracles: A foundation for game theory in artificial intelligence B Fallenstein, J Taylor, PF Christiano Logic, Rationality, and Interaction: 5th International Workshop, LORI 2015 …, 2015 | 11 | 2015 |
Lossless Fault-Tolerant Data Structures with Additive Overhead. PF Christiano, ED Demaine, S Kishore WADS, 243-254, 2011 | 8 | 2011 |