ViennaCL---linear algebra library for multi-and many-core architectures K Rupp, P Tillet, F Rudolf, J Weinbub, A Morhammer, T Grasser, A Jungel, ... SIAM Journal on Scientific Computing 38 (5), S412-S439, 2016 | 113 | 2016 |

Triton: an intermediate language and compiler for tiled neural network computations P Tillet, HT Kung, D Cox Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine …, 2019 | 67 | 2019 |

Towards {Performance-Portable}, Scalable, and Convenient Linear Algebra P Tillet, K Rupp, S Selberherr, CT Lin 5th USENIX Workshop on Hot Topics in Parallelism (HotPar 13), 2013 | 32 | 2013 |

Input-aware auto-tuning of compute-bound HPC kernels P Tillet, D Cox Proceedings of the International Conference for High Performance Computing …, 2017 | 30 | 2017 |

An automatic OpenCL compute kernel generator for basic linear algebra operations P Tillet, K Rupp, S Selberherr Proc. 2012 Symposium on High Performance Computing (HPC'12). Orlando …, 2012 | 28 | 2012 |

Infomax-ICA using Hessian-free optimization P Tillet, HT Kung, D Cox 2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017 | 7 | 2017 |

Performance portability study of linear algebra kernels in OpenCL K Rupp, P Tillet, F Rudolf, J Weinbub, T Grasser, A Jüngel Proceedings of the International Workshop on OpenCL 2013 & 2014, 1-11, 2014 | 7 | 2014 |

Blocked Algorithms for Neural Networks: Design and Implementation on GPUs P Tillet Harvard University, 2020 | | 2020 |

ViennaCL-Fast Linear Algebra for Multi and Many-Core Architectures K Rupp, P Tillet, T St Clere Smithe, N Karovic, J Weinbub, F Rudolf | | 2015 |

Achieving Portable High Performance for Iterative Solvers on Accelerators K Rupp, P Tillet, A Jüngel, T Grasser PAMM 14 (1), 963-964, 2014 | | 2014 |

Performance-portable kernels in OpenCL: Lessons learned K Rupp, P Tillet | | 2013 |

ViennaCL-Portable High Performance at High Convenience K Rupp, P Tillet, F Rudolf, J Weinbub ENUMATH 2013 Proceedings, 1-2, 2013 | | 2013 |