ViennaCL---linear algebra library for multi-and many-core architectures K Rupp, P Tillet, F Rudolf, J Weinbub, A Morhammer, T Grasser, A Jungel, ... SIAM Journal on Scientific Computing 38 (5), S412-S439, 2016 | 73 | 2016 |

Towards performance-portable, scalable, and convenient linear algebra P Tillet, K Rupp, S Selberherr, CT Lin 5th {USENIX} Workshop on Hot Topics in Parallelism (HotPar 13), 2013 | 29 | 2013 |

An automatic OpenCL compute kernel generator for basic linear algebra operations P Tillet, K Rupp, S Selberherr Proceedings of the Symposium on High Performance Computing, HPC 12, 4, 2012 | 24 | 2012 |

Input-aware auto-tuning of compute-bound HPC kernels P Tillet, D Cox Proceedings of the International Conference for High Performance Computing …, 2017 | 20 | 2017 |

Triton: an intermediate language and compiler for tiled neural network computations P Tillet, HT Kung, D Cox Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine …, 2019 | 10 | 2019 |

Performance portability study of linear algebra kernels in OpenCL K Rupp, P Tillet, F Rudolf, J Weinbub, T Grasser, A Jüngel Proceedings of the International Workshop on OpenCL 2013 & 2014, 1-11, 2014 | 6 | 2014 |

Infomax-ICA using hessian-free optimization P Tillet, HT Kung, D Cox 2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017 | 4 | 2017 |

Blocked Algorithms for Neural Networks: Design and Implementation on GPUs P Tillet Harvard University, 2020 | | 2020 |

Achieving Portable High Performance for Iterative Solvers on Accelerators K Rupp, P Tillet, A Jüngel, T Grasser PAMM 14 (1), 963-964, 2014 | | 2014 |