Follow
Aengus Lynch
Aengus Lynch
Verified email at ucl.ac.uk - Homepage
Title
Cited by
Cited by
Year
Towards automated circuit discovery for mechanistic interpretability
A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso
Advances in Neural Information Processing Systems 36, 16318-16352, 2023
2072023
Causal machine learning: A survey and open problems
J Kaddour, A Lynch, Q Liu, MJ Kusner, R Silva
arXiv preprint arXiv:2206.15475, 2022
1822022
Eight methods to evaluate robust unlearning in llms
A Lynch, P Guo, A Ewart, S Casper, D Hadfield-Menell
arXiv preprint arXiv:2402.16835, 2024
412024
Spawrious: A benchmark for fine control of spurious correlation biases
A Lynch, GJS Dovonon, J Kaddour, R Silva
arXiv preprint arXiv:2303.05470, 2023
302023
Targeted latent adversarial training improves robustness to persistent harmful behaviors in llms
A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ...
arXiv e-prints, arXiv: 2407.15549, 2024
17*2024
Analyzing the generalization and reliability of steering vectors
D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk
arXiv preprint arXiv:2407.12404, 2024
52024
Evaluating the impact of geometric and statistical skews on out-of-distribution generalization performance
A Lynch, J Kaddour, R Silva
NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and …, 2022
52022
Best-of-N Jailbreaking
J Hughes, S Price, A Lynch, R Schaeffer, F Barez, S Koyejo, H Sleight, ...
arXiv preprint arXiv:2412.03556, 2024
12024
Plan B: Training LLMs to fail less severely
J Stastny, N Warncke, D Xu, A Lynch, F Barez, H Sleight, E Perez
2024
H-Space Sparse Autoencoders
A Ijishakin, ML Ang, L Baljer, DCH Tan, HL Fry, A Abdulaal, A Lynch, ...
Neurips Safe Generative AI Workshop 2024, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–10