Peter Hase

Zitiert von

	Alle	Seit 2019
Zitate	1077	1076
h-index	13	13
i10-index	14	14

440

220

110

330

2019202020212022202320243 22 107 237 428 275

Öffentlicher Zugriff

Alle anzeigen

3 Artikel

0 Artikel

verfügbar

nicht verfügbar

Basierend auf Fördermandaten

Koautoren

Mohit BansalParker Distinguished Professor, Computer Science, UNC Chapel HillBestätigte E-Mail-Adresse bei cs.unc.edu
Cynthia RudinProfessor of Computer Science, ECE, Statistics, and Biostatistics & Bioinformatics, Duke UniversityBestätigte E-Mail-Adresse bei cs.duke.edu
Swarnadeep SahaPhD Student, University of North Carolina at Chapel HillBestätigte E-Mail-Adresse bei cs.unc.edu
Shiyue ZhangUNC Chapel HillBestätigte E-Mail-Adresse bei cs.unc.edu
Srini IyerFAIRBestätigte E-Mail-Adresse bei fb.com
Asma GhandehariounResearch Scientist, Google ResearchBestätigte E-Mail-Adresse bei google.com
Been KimGoogle DeepMindBestätigte E-Mail-Adresse bei csail.mit.edu
Zhuofan YingColumbia UniversityBestätigte E-Mail-Adresse bei columbia.edu
Peter ClarkAllen Institute for Artificial Intelligence (AI2)Bestätigte E-Mail-Adresse bei allenai.org
Sarah WiegreffeAllen Institute for AI & University of WashingtonBestätigte E-Mail-Adresse bei allenai.org

Folgen

Peter Hase

PhD Student, University of North Carolina at Chapel Hill

Bestätigte E-Mail-Adresse bei cs.unc.edu - Startseite

Interpretable Machine Learning Natural Language Processing


Titel Nach Zitationen sortieren Nach Jahr sortieren Nach Titel sortieren	Zitiert von Zitiert von	Jahr
Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? P Hase, M Bansal arXiv preprint arXiv:2005.01831, 2020	256	2020
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	144	2023
Interpretable image recognition with hierarchical prototypes P Hase, C Chen, O Li, C Rudin Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 7 …, 2019	103	2019
Grips: Gradient-free, edit-based instruction search for prompting large language models A Prasad, P Hase, X Zhou, M Bansal arXiv preprint arXiv:2203.07281, 2022	95	2022
Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs P Hase, M Diab, A Celikyilmaz, X Li, Z Kozareva, V Stoyanov, M Bansal, ... arXiv preprint arXiv:2111.13654, 2021	78*	2021
Leakage-adjusted simulatability: Can models generate non-trivial explanations of their behavior in natural language? P Hase, S Zhang, H Xie, M Bansal arXiv preprint arXiv:2010.04119, 2020	76	2020
Fastif: Scalable influence functions for efficient model interpretation and debugging H Guo, NF Rajani, P Hase, M Bansal, C Xiong arXiv preprint arXiv:2012.15781, 2020	72	2020
The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations P Hase, H Xie, M Bansal Advances in Neural Information Processing Systems 34, 2021	63	2021
When can models learn from explanations? a formal framework for understanding the roles of explanation data P Hase, M Bansal arXiv preprint arXiv:2102.02201, 2021	61	2021
Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models P Hase, M Bansal, B Kim, A Ghandeharioun Advances in Neural Information Processing Systems 36, 2024	48	2024
Summarization programs: Interpretable abstractive summarization with neural modular trees S Saha, S Zhang, P Hase, M Bansal arXiv preprint arXiv:2209.10492, 2022	15	2022
Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization S Saha, P Hase, M Bansal Advances in Neural Information Processing Systems 36, 2024	13*	2024
Low-cost algorithmic recourse for users with uncertain cost functions P Yadav, P Hase, M Bansal arXiv preprint arXiv:2111.01235, 2021	13	2021
Can sensitive information be deleted from llms? objectives for defending against extraction attacks V Patil, P Hase, M Bansal arXiv preprint arXiv:2309.17410, 2023	11	2023
Visfis: Visual feature importance supervision with right-for-the-right-reason objectives Z Ying, P Hase, M Bansal Advances in Neural Information Processing Systems 35, 17057-17072, 2022	9	2022
Rethinking Machine Unlearning for Large Language Models S Liu, Y Yao, J Jia, S Casper, N Baracaldo, P Hase, X Xu, Y Yao, H Li, ... arXiv preprint arXiv:2402.08787, 2024	7	2024
Are hard examples also harder to explain? a study with human and model-generated explanations S Saha, P Hase, N Rajani, M Bansal arXiv preprint arXiv:2211.07517, 2022	7	2022
Shall i compare thee to a machine-written sonnet? an approach to algorithmic sonnet generation J Benhardt, P Hase, L Zhu, C Rudin arXiv preprint arXiv:1811.05067, 2018	5	2018
The unreasonable effectiveness of easy training data for hard tasks P Hase, M Bansal, P Clark, S Wiegreffe arXiv preprint arXiv:2401.06751, 2024	1	2024
Foundational Challenges in Assuring Alignment and Safety of Large Language Models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024		2024

Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.

Artikel 1–20

Zitate pro Jahr

Doppelte Zitate

Zusammengeführte Zitate

Koautor hinzufügenKoautoren

Folgen

Zitiert von

Koautoren