|
| | Preview | Authors / Editors | Title | Type | Issue Date |
| 1 | | Laso Rodriguez, Ruben ; Salimi Beni, Majid ; Vardas, Ioannis ; Benkner, Siegfried ; Hunold, Sascha | To ncclsee, or Not to ncclsee: That is the Profiling Question | Inproceedings Konferenzbeitrag | 10-Apr-2026 |
| 2 | | Träff, Jesper Larsson | Lectures on Parallel Computing | Book Buch | 2026 |
| 3 | | Salimi Beni, Majid ; Laso, Ruben ; Cosenza, Biagio ; Benkner, Siegfried ; Hunold, Sascha | Exploring NCCL Tuning Strategies for Distributed Deep Learning | Inproceedings Konferenzbeitrag  | 13-Aug-2025 |
| 4 |  | Vardas, Ioannis ; Träff, Jesper Larsson ; Laso, Ruben ; Hunold, Sascha | Mpisee: communicator-centric profiling of MPI applications | Article Artikel  | 25-Jul-2025 |
| 5 | | Träff, Jesper Larsson | Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan) | Preprint Preprint  | 7-Jul-2025 |
| 6 |  | Salimi Beni, Majid ; Laso, Ruben ; Cosenza, Biagio ; Benkner, Siegfried ; Hunold, Sascha | Optimizing Distributed Deep Learning Training by Tuning NCCL | Inproceedings Konferenzbeitrag  | 22-May-2025 |
| 7 |  | Vardas, Ioannis ; Laso Rodriguez, Ruben ; Salimi Beni, Majid | ncclsee: A Lightweight Profiling Tool for NCCL | Inproceedings Konferenzbeitrag  | 22-May-2025 |
| 8 |  | Träff, Jesper Larsson | Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms | Preprint Preprint  | 13-Feb-2025 |
| 9 | | Carpentieri, Lorenzo ; De Caro, Antonio ; Salimibeni, Majid ; Fan, Kaijie ; Cosenza, Biagio | Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing | Inproceedings Konferenzbeitrag  | 2025 |
| 10 | | Salimibeni, Majid ; Cosenza, Biagio ; Hunold, Sascha | MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns | Inproceedings Konferenzbeitrag  | 7-Nov-2024 |
| 11 | | Vardas, Ioannis ; Hunold, Sascha ; Swartvagher, Philippe ; Träff, Jesper Larsson | Exploring Mapping Strategies for Co-allocated HPC Applications | Inproceedings Konferenzbeitrag  | 4-Nov-2024 |
| 12 |  | Träff, Jesper Larsson | Lectures on Parallel Computing | Preprint Preprint  | 26-Jul-2024 |
| 13 | | Träff, Jesper Larsson | Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction | Preprint Preprint  | 26-Jul-2024 |
| 14 |  | Laso Rodriguez, Ruben ; Krupitza, Diego ; Hunold, Sascha | pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations | Preprint Preprint  | 9-Feb-2024 |
| 15 |  | Salimi Beni, Majid ; Hunold, Sascha ; Cosenza, Biagio | Analysis and prediction of performance variability in large-scale computing systems | Article Artikel  | 2024 |
| 16 | | Vardas, Ioannis ; Hunold, Sascha ; SWARTVAGHER, Philippe ; Träff, Jesper Larsson | Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping | Inproceedings Konferenzbeitrag  | 2024 |
| 17 | | Hunold, Sascha ; Xie, Biwei ; Shu, Kai | Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers | Proceedings Tagungsband | 2024 |
| 18 | | Laso Rodriguez, Ruben ; Krupitza, Diego ; Hunold, Sascha | Exploring Scalability in C++ Parallel STL Implementations | Inproceedings Konferenzbeitrag  | 2024 |
| 19 |  | Träff, Jesper Larsson | Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time | Preprint Preprint  | 18-Dec-2023 |
| 20 | | Hunold, Sascha | Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems | Presentation Vortrag | 8-Dec-2023 |
| 21 |  | Swartvagher, Philippe ; Hunold, Sascha ; Träff, Jesper Larsson ; Vardas, Ioannis | Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures | Inproceedings Konferenzbeitrag  | 12-Nov-2023 |
| 22 |  | Hunold, Sascha | Verifying Performance Guidelines for MPI Collectives at Scale | Inproceedings Konferenzbeitrag  | 12-Nov-2023 |
| 23 | | Laso Rodriguez, Ruben ; Casado, Fernando E. | The research career after the PhD | Presentation Vortrag | 3-Nov-2023 |
| 24 |  | Träff, Jesper Larsson ; Vardas, Ioannis | Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes | Inproceedings Konferenzbeitrag  | 21-Sep-2023 |
| 25 | | Schuchart, Joseph ; Hunold, Sascha ; Bosilca, George | Synchronizing MPI Processes in Space and Time | Inproceedings Konferenzbeitrag | Sep-2023 |
| 26 | | Forsell, Martti ; Roivainen, Jussi ; Leppänen, Ville ; Träff, Jesper Larsson | Realizing multioperations and multiprefixes in Thick Control Flow processors | Article Artikel  | Apr-2023 |
| 27 | | Träff, Jesper Larsson ; Hunold, Sascha ; Vardas, Ioannis ; Funk, Nikolaus Manes | Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI | Inproceedings Konferenzbeitrag  | 2023 |
| 28 | | Hunold, Sascha ; Steiner, Sebastian | OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning | Inproceedings Konferenzbeitrag  | 2023 |
| 29 |  | Hunold, Sascha ; Vardas, Ioannis ; Ibis, Gökhan ; Langer, Thierry | Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers | Inproceedings Konferenzbeitrag  | 2023 |
| 30 |  | Vardas, Ioannis ; Hunold, Sascha ; Swartvagher, Philippe ; Träff, Jesper Larsson | Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications | Inproceedings Konferenzbeitrag  | 2023 |
| 31 | | Forsell, Martti ; Roivainen, Jussi ; Leppänen, Ville ; Träff, Jesper Larsson | Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors | Inproceedings Konferenzbeitrag | 2023 |
| 32 |  | Hunold, Sascha ; Hagn, Maximilian | MPI is Good, Control is Better: Checking Performance Guidelines of Collectives | Inproceedings Konferenzbeitrag  | 2023 |
| 33 | | Hunold, Sascha ; Kraßnitzer, Klaus Dieter Vincenz | A Quantitative Analysis of OpenMP Task Runtime Systems | Inproceedings Konferenzbeitrag | 2023 |
| 34 |  | Swartvagher, Philippe ; Vardas, Ioannis ; Hunold, Sascha ; Träff, Jesper Larsson | Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers | Inproceedings Konferenzbeitrag  | 2023 |
| 35 | | Träff, Jesper Larsson | Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules | Inproceedings Konferenzbeitrag | 11-Jul-2022 |
| 36 |  | Hunold, Sascha ; Ajanohoun, Jordy Innocentius ; Vardas, Ioannis ; Träff, Jesper Larsson | An Overhead Analysis of MPI Profiling and Tracing Tools | Inproceedings Konferenzbeitrag  | 27-Jun-2022 |
| 37 | | Hunold, Sascha ; Przybylski, Bartłomiej | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia | Presentation Vortrag | 18-May-2022 |
| 38 | | Vardas, Ioannis ; Hunold, Sascha ; Ajanohoun, Jordy I. ; Traff, Jesper Larsson | mpisee: MPI Profiling for Communication and Communicator Structure | Inproceedings Konferenzbeitrag  | 2022 |
| 39 | | Hunold, Sascha | Performance Tuning of MPI Collectives - Status Quo and Open Problems | Presentation Vortrag | 2022 |
| 40 | | Vardas, Ioannis ; Hunold, Sascha ; Ajanohoun, Jordy I. ; Träff, Jesper Larsson | mpisee: MPI Profiling for Communication and Communicator Structure | Konferenzbeitrag Inproceedings  | 2022 |
| 41 | | Ajanohoun, Jordy I. ; Vardas, Ioannis ; Träff, Jesper Larsson ; Hunold, Sascha | MPI Performance Tools under the Microscope: A Thorough Overhead Analysis | Konferenzbeitrag Inproceedings  | 2022 |
| 42 | | Forsell, Martti ; Nikula, Sara ; Roivainen, Jussi ; Leppänen, Ville ; Träff, Jesper Larsson | Performance and programmability comparison of the thick control flow architecture and current multicore processors | Artikel Article  | 2022 |
| 43 | | Träff, Jesper Larsson | Fast(er) Construction of Round-optimal n-Block Broadcast Schedules | Inproceedings Konferenzbeitrag  | 2022 |
| 44 | | Träff, Jesper Larsson | (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI | Preprint Preprint | 2022 |
| 45 | | Hunold, Sascha ; Przybylski, Bartlomiej | Teaching Complex Scheduling Algorithms | Konferenzbeitrag Inproceedings  | 2021 |
| 46 | | Träff, Jesper Larsson ; Pöter, Manuel | A more pragmatic implementation of the lock-free, ordered, linked list | Konferenzbeitrag Inproceedings  | 2021 |
| 47 | | Träff, Jesper Larsson ; Hunold, Sascha ; Mercier, Guillaume ; Holmes, Daniel J. | MPI collective communication through a single set of interfaces: A case for orthogonality | Artikel Article  | 2021 |
| 48 | | Hunold, Sascha ; Ajanohoun, Jordy I. ; Carpen-Amarie, Alexandra | MicroBench Maker: Reproduce, Reuse, Improve | Konferenzbeitrag Inproceedings  | 2021 |
| 49 | | Träff, Jesper Larsson | A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation | Preprint Preprint | 2021 |
| 50 | | Träff, Jesper Larsson | Decomposing MPI Collectives for Exploiting Multi-lane Communication | Präsentation Presentation | 2020 |
| 51 | | Kirchbach, Konrad Von ; Schulz, Christian ; Träff, Jesper Larsson | Better Process Mapping and Sparse Quadratic Assignment | Artikel Article  | 2020 |
| 52 | | Hunold, Sascha ; Bhatele, Abhinav ; Bosilca, George ; Knees, Peter | Predicting MPI Collective Communication Performance Using Machine Learning | Konferenzbeitrag Inproceedings  | 2020 |
| 53 | | Träff, Jesper Larsson ; Hunold, Sascha | Decomposing MPI Collectives for Exploiting Multi-lane Communication | Konferenzbeitrag Inproceedings  | 2020 |
| 54 | | Träff, Jesper Larsson | Signature Datatypes for Type Correct Collective Operations, Revisited | Konferenzbeitrag Inproceedings  | 2020 |
| 55 | | Faraj, Marcelo Fonseca ; van der Grinten, Alexander ; Meyerhenke, Henning ; Träff, Jesper Larsson ; Schulz, Christian | High-Quality Hierarchical Process Mapping | Konferenzbeitrag Inproceedings  | 2020 |
| 56 | | Hunold, Sascha ; Przybylski, Bartlomiej | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia | Preprint Preprint | 2020 |
| 57 | | Forsell, Martti ; Roivainen, Jussi ; Träff, Jesper Larsson | Optimizing Memory Access in TCF Processors with Compute-Update Operations | Konferenzbeitrag Inproceedings  | 2020 |
| 58 | | von Kirchbach, Konrad ; Lehr, Markus ; Hunold, Sascha ; Schulz, Christian ; Träff, Jesper Larsson | Efficient Process-to-Node Mapping Algorithms for Stencil Computations | Konferenzbeitrag Inproceedings  | 2020 |
| 59 | | Träff, Jesper Larsson ; Pöter, Manuel | A more Pragmatic Implementation of the Lock-free, Ordered, Linked List | Preprint Preprint | 2020 |
| 60 | | Faraj, Marcelo Fonseca ; van der Grinten, Alexander ; Meyerhenke, Henning ; Träff, Jesper Larsson ; Schulz, Christian | High-Quality Hierarchical Process Mapping | Preprint Preprint | 2020 |
| 61 | | Hunold, Sascha ; von Kirchbach, Konrad ; Lehr, Markus ; Schulz, Christian ; Träff, Jesper Larsson | Efficient Process-to-Node Mapping Algorithms for Stencil Computations | Preprint Preprint | 2020 |
| 62 | | Träff, Jesper Larsson | k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms | Preprint Preprint | 2020 |
| 63 | | Träff, Jesper Larsson ; Hoefler, Torsten | Special issue: Selected papers from EuroMPI 2019 | Article Artikel | 2020 |
| 64 | | Träff, Jesper Larsson | Exploiting Multi-lane Communication in MPI Collectives | Konferenzbeitrag Inproceedings  | 2020 |
| 65 | | Lehr, Markus ; von Kirchbach, Konrad | Improved Cartesian Topology Mapping in MPI | Konferenzbeitrag Inproceedings  | 2020 |
| 66 | | Pachajoa, Carlos ; Levonyak, Markus ; Pacher, Christina ; Träff, Jesper Larsson ; Gansterer, Wilfried | Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience | Konferenzbeitrag Inproceedings  | 2020 |
| 67 | | Träff, Jesper Larsson ; Hunold, Sascha ; Mercier, Guillaume ; Holmes, Daniel J. | Collectives and Communicators: A Case for Orthogonality: (Or: How to get rid of MPI neighbor and enhance Cartesian collectives) | Konferenzbeitrag Inproceedings  | 2020 |
| 68 | | Träff, Jesper Larsson | On Optimal Trees for Irregular Gather and Scatter Collectives | Artikel Article  | 1-Sep-2019 |
| 69 | | Kang, Qiao ; Träff, Jesper Larsson ; Al-Bahrani, Reda ; Agrawal, Ankit ; Choudhary, Alok ; Liao, Wei-keng | Scalable Algorithms for MPI Intergroup Allgather and Allgatherv | Artikel Article  | 2019 |
| 70 | | Träff, Jesper Larsson | On Optimal Trees for Irregular Gather and Scatter Collectives | Präsentation Presentation | 2019 |
| 71 | | Träff, Jesper Larsson | On optimal Trees for irregular gather and scatter collectives? | Präsentation Presentation | 2019 |
| 72 | | Träff, Jesper Larsson | Cartesian Collective Communication: "Advice to users", "Advice to implementers", and "Advice to Standardizers" | Präsentation Presentation | 2019 |
| 73 | | Träff, Jesper Larsson | On optimal Trees for irregular gather and scatter collectives? | Präsentation Presentation | 2019 |
| 74 | | Kainrad, Thomas ; Hunold, Sascha ; Seidel, Thomas ; Langer, Thierry | LigandScout Remote: A New User-Friendly Interface for HPC and Cloud Resources | Artikel Article  | 2019 |
| 75 | | Hoefler, Torsten ; Träff, Jesper Larsson | Proceedings of the 26th European MPI Users' Group Meeting, EuroMPI 2019 | Konferenzband Proceedings  | 2019 |
| 76 | | Träff, Jesper Larsson ; Hunold, Sascha | Cartesian Collective Communication | Konferenzbeitrag Inproceedings  | 2019 |
| 77 | | Hunold, Sascha ; Carpen-Amarie, Alexandra | On the Importance of Data Quality when Tuning MPI Libraries | Konferenzbeitrag Inproceedings  | 2019 |
| 78 | | Kainer, Michael ; Träff, Jesper Larsson | More Parallelism in Dijkstra's Single-Source Shortest Path Algorithm | Preprint Preprint | 2019 |
| 79 | | Träff, Jesper Larsson | Decomposing Collectives for Exploiting Multi-lane Communication | Preprint Preprint | 2019 |
| 80 | | Pachajoa, Carlos ; Levonyak, Markus ; Gansterer, Wilfried N. ; Träff, Jesper Larsson | How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures | Konferenzbeitrag Inproceedings  | 2019 |
| 81 | | Pachajoa, Carlos ; Levonyak, Markus ; Gansterer, Wilfried ; Träff, Jesper Larsson | How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures | Preprint Preprint | 2019 |
| 82 | | Träff, Jesper Larsson ; Hoefler, Torsten | Foreword EuroMPI 2019 | Konferenzbeitrag Inproceedings | 2019 |
| 83 | | Forsell, Martti ; Roivainen, Jussi ; Leppänen, Ville ; Träff, Jesper Larsson | Supporting concurrent memory access in TCF processor architectures | Artikel Article  | 2018 |
| 84 | | Träff, Jesper Larsson | Practical, distributed, low overhead algorithms for irregular gather and scatter collectives | Artikel Article  | 2018 |
| 85 | | Pöter, Manuel ; Träff, Jesper Larsson | <i>Stamp-it</i>
, amortized constant-time memory reclamation in comparison to five other schemes | Konferenzbeitrag Inproceedings  | 2018 |
| 86 | | Pöter, Manuel ; Träff, Jesper Larsson | Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model | Preprint Preprint | 2018 |
| 87 | | Träff, Jesper Larsson | Parallel Quicksort without Pairwise Element Exchange | Preprint Preprint | 2018 |
| 88 | | Pöter, Manuel ; Träff, Jesper Larsson | Memory Models for C/C++ Programmers | Preprint Preprint | 2018 |
| 89 | | Hunold, Sascha ; Carpen-Amarie, Alexandra | Hierarchical Clock Synchronization in MPI | Konferenzbeitrag Inproceedings  | 2018 |
| 90 | | Kang, Qiao ; Träff, Jesper Larsson ; Al-Bahrani, Reda ; Agrawal, Ankit ; Choudhary, Alok ; Liao, Wei-keng | Full-Duplex Inter-Group All-to-All Broadcast Algorithms with Optimal Bandwidth | Konferenzbeitrag Inproceedings  | 2018 |
| 91 | | Hunold, Sascha ; Carpen-Amarie, Alexandra | Algorithm Selection of MPI Collectives Using Machine Learning Techniques | Konferenzbeitrag Inproceedings  | 2018 |
| 92 | | Hunold, Sascha ; Carpen-Amarie, Alexandra | Autotuning MPI Collectives using Performance Guidelines | Konferenzbeitrag Inproceedings  | 2018 |
| 93 | | Forsell, Martti ; Roivainen, Jussi ; Leppänen, Ville ; Träff, Jesper Larsson | Implementation of Multioperations in Thick Control Flow Processors | Konferenzbeitrag Inproceedings  | 2018 |
| 94 | | Träff, Jesper Larsson | On Optimal trees for Irregular Gather and Scatter Collectives | Präsentation Presentation | 2018 |
| 95 | | Pöter, Manuel ; Träff, Jesper Larsson | Brief Announcement: Stamp-it, a more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model | Konferenzbeitrag Inproceedings  | 2018 |
| 96 | | Carpen-Amarie, Alexandra ; Hunold, Sascha ; Träff, Jesper Larsson | On expected and observed communication performance with MPI derived datatypes | Artikel Article  | 2017 |
| 97 | | Träff, Jesper Larsson | Fast Processing of MPI Derived Datatypes? | Präsentation Presentation | 2017 |
| 98 | | Bleuse, Raphael ; Hunold, Sascha ; Kedad-Sidhoum, Safia ; Monna, Florence ; Mounie, Gregory ; Trystram, Denis | Scheduling Independent Moldable Tasks on Multi-Cores with GPUs | Artikel Article  | 2017 |
| 99 | | Lusk, Ewing ; Träff, Jesper Larsson | MPI Is 25 Years Old! | Artikel Article | 2017 |
| 100 | | Träff, Jesper Larsson | High Performance Expectations for MPI | Konferenzbeitrag Inproceedings | 2017 |