aboutsummaryrefslogtreecommitdiff
path: root/libomptarget/deviceRTLs/nvptx/src/sync.cu
AgeCommit message (Collapse)Author
2019-10-04[libomptarget][nfc] Update remaining uint32 to use lanemask_tJon Chesterfield
Summary: [libomptarget][nfc] Update remaining uint32 to use lanemask_t Update a few functions in the API to use lanemask_t instead of i32. NFC for nvptx. Also update the ActiveThreads type in DataSharingStateTy. This removes a lot of #ifdef from the downsteam amdgcn implementation. Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D68513 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@373806 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-03[libomptarget] Refactor activemask macro to inline functionJon Chesterfield
Summary: [libomptarget] Refactor activemask macro to inline function See also abandoned D66846, split into this diff and others. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert, ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66851 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@370781 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-28[libomptarget] Refactor syncthreads macro to inline functionJon Chesterfield
Summary: [libomptarget] Refactor syncthreads macro to inline function See also abandoned D66846, split into this diff and others. Rev 2 of D66855 Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D66861 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@370210 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-26[OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function.Alexey Bataev
Summary: Added function void __kmpc_syncwarp(int32_t) to expose it to the compiler. It is required to fix the problem with the critical regions in Cuda9.0+. We cannot use barrier in the critical region, but still need to reconverge the threads in the warp after. This function allows to do this. Reviewers: grokos, jdoerfert Subscribers: guansong, openmp-commits, kkwli0, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D66672 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@369933 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-25[OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less.Alexey Bataev
Summary: According to the OpenMP standard, barrier operation must perform implicit flush operation. Currently, if there is only one thread in the team, barrier does not flush the memory. Patch fixes this problem. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62398 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@367024 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27[OPENMP][NVPTX]Relax flush directive.Alexey Bataev
Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@364572 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-24Revert "[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC."Alexey Bataev
This reverts commit r361421 to split the patch into 3 parts. git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@361638 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-22[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC.Alexey Bataev
Summary: Parallel level counter should be volatile to prevent some dangerous optimiations by the ptxas. Otherwise, ptxas optimizations lead to undefined behaviour in some cases. Also, use __threadfence() for #pragma omp flush and if the barrier should not be used (we have only one thread in the team), still perform flush operation since the standard requires implicit flush when executing barriers. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62199 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@361421 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-10[OPENMP][NVPTX]Improve number of threads counter, NFC.Alexey Bataev
Summary: Patch improves performance of the full runtime mode by moving number-of-threads counter to the shared memory. It also allows to save global memory. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D61785 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@360457 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-02[OPENMP][NVPTX]Improve code by using parallel level counter.Alexey Bataev
Summary: Previously for the different purposes we need to get the active/common parallel level and with full runtime we iterated over all the records to calculate this level. Instead, we can used the warp-based parallel level counters used in no-runtime mode. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D61395 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@359822 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19Update more file headers across all of the LLVM projects in the monorepoChandler Carruth
to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@351648 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-04[OPENMP][NVPTX]Improve performance + reduce number of used registers.Alexey Bataev
Summary: Reduced number of the used register + improved performance propagating the information about current execution/data sharing mode directly from the compiler, where it is possible. In some cases, it requires new/reworked interfaces of the runtime external functions. Old functions are marked as deprecated. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56278 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@350405 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-03[OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.Alexey Bataev
Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@350333 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-06[OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 ↵Alexey Bataev
function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@348521 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-06[OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block.Alexey Bataev
Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@348491 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30[OPENMP][NVPTX]Make runtime compatible with the original runtime.Alexey Bataev
Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@348003 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-27[OpenMP][libomptarget] Refactor SPMD and runtime requirement checkingGheorghe-Teodor Bercea
Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54960 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@347698 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-29[OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC.Alexey Bataev
Required to fix the buildbots. git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@340956 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-29[OPENMP][NVPTX] Lightweight runtime support for SPMD mode.Alexey Bataev
Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@340944 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-23[OPNEMP, NVPTX] Fixed sychronization construct + code cleanup.Alexey Bataev
Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@337691 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-25[OPENMP, NVPTX] Fixes for NVPTX RTLAlexey Bataev
Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@335469 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-29[OpenMP] Initial implementation of OpenMP offloading library - libomptarget ↵George Rokos
device RTLs. This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices. Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs. The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target. Differential revision: https://reviews.llvm.org/D14254 git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@323649 91177308-0d34-0410-b5e6-96231b3b80d8