Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 7921 | 2023 |
The alignment problem from a deep learning perspective R Ngo, L Chan, S Mindermann arXiv preprint arXiv:2209.00626, 2022 | 192 | 2022 |
Avoiding side effects by considering future tasks V Krakovna, L Orseau, R Ngo, M Martic, S Legg Advances in Neural Information Processing Systems 33, 19064-19074, 2020 | 50 | 2020 |
Computing Power and the Governance of Artificial Intelligence G Sastry, L Heim, H Belfield, M Anderljung, M Brundage, J Hazell, ... arXiv preprint arXiv:2402.08797, 2024 | 41 | 2024 |
Agi safety from first principles R Ngo AI Alignment Forum 28, 2020 | 19 | 2020 |
REALab: An embedded perspective on tampering R Kumar, J Uesato, R Ngo, T Everitt, V Krakovna, S Legg arXiv preprint arXiv:2011.08820, 2020 | 14 | 2020 |
Avoiding tampering incentives in deep RL via decoupled approval J Uesato, R Kumar, V Krakovna, T Everitt, R Ngo, S Legg arXiv preprint arXiv:2011.08827, 2020 | 10 | 2020 |
Automating Supervision of AI Delegates R Ngo, J Tallinn Cambridge Handbook of Responsible Artificial Intelligence, 2022 | | 2022 |