summaryrefslogtreecommitdiff
path: root/openmp/libomptarget
AgeCommit message (Collapse)Author
2019-01-09[OpenMP][libomptarget] Use shared memory variable for tracking parallel levelGheorghe-Teodor Bercea
Summary: Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55773
2019-01-07[OPENMP][NVPTX]Fix dynamic scheduling.Alexey Bataev
Summary: Previous implementation may cause the runtime crash when the number of teams is > 1024. Patch fixes this problem + reduces number of the atomic operations by 32 times. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56332
2019-01-04[OPENMP][NVPTX]General formatting/code improvement, NFC.Alexey Bataev
Summary: Formatting. Reviewers: gtbercea, grokos, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56290
2019-01-04[OPENMP][NVPTX]Improve performance + reduce number of used registers.Alexey Bataev
Summary: Reduced number of the used register + improved performance propagating the information about current execution/data sharing mode directly from the compiler, where it is possible. In some cases, it requires new/reworked interfaces of the runtime external functions. Old functions are marked as deprecated. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56278
2019-01-04[OpenMP] Fix nvidia-cuda-toolkit detection on Debian/UbuntuJoel E. Denny
The OpenMP runtime's cmake scripts do not correctly locate the libdevice that the Debian/Ubuntu package nvidia-cuda-toolkit currently includes, at least on my Ubuntu 18.04.1 installation. This patch fixes that for me. This problem was discussed at length in D55269. D40453 added a similar adjustment in clang, but reviewers of D55269 concluded that, for the OpenMP runtime, the right place to address this problem is in cmake's CUDA support. However, it was also suggested we could add a workaround to OpenMP's cmake scripts now. This patch contains such a workaround, which I've tried to design so that it will have no harmful effect if cmake improves in the future. nvidia-cuda-toolkit also needs improvements because its intended monolithic CUDA tree shim, /usr/lib/cuda, has many empty directories, such as bin. I reported that at: <https://bugs.launchpad.net/ubuntu/+source/nvidia-cuda-toolkit/+bug/1808999> Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D55588
2019-01-03[OpenMP] Add omp_get_device_num() and update several other device API functionsJonathan Peyton
Add omp_get_device_num() function for 5.0 which returns the number of the device the current thread is running on. Currently, we are leaving it to the compiler to handle this properly if it is called inside target. Also, did some cleanup and updating of duplicate device API functions (in both libomp and libomptarget) to make them into weak functions that check for the symbol from libomptarget, and will call the version in libomptarget if it is present. If any additional device API functions are implemented also in libomptarget in the future, we should add the dlsym calls to the host functions. Also, if the omp_target_* functions are to be implemented for the host (this has been requested), they should attempt to call the libomptarget versions as well. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D55578
2019-01-03[OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.Alexey Bataev
Summary: One of the LLVM optimizations, split critical edges, also clones tail instructions. This is a dangerous operation for __syncthreads() functions and this transformation leads to undefined behavior or incorrect results. Patch fixes this problem by replacing __syncthreads() function with the assembler instruction, which cost is too high and wich cannot be copied. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, openmp-commits, caomhin Differential Revision: https://reviews.llvm.org/D56274
2019-01-02[libomptarget] Added install component for libomptargetVyacheslav Zakharin
Differential Revision: https://reviews.llvm.org/D56108
2018-12-28[OPENMP][NVPTX]Added/fixed debugging messages, NFC.Alexey Bataev
Summary: Added or fixed new/old debugging messages for the better diagnostics. Reviewers: gtbercea, kkwli0, grokos Reviewed By: grokos Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D56102
2018-12-28[OPENMP][NVPTX]Fixed initialization of the data-sharing interface.Alexey Bataev
Summary: Avoid using of the atomic loop to wait for the completion of the data-sharing interface initialization, use __shfl_sync instead for the communication within the warp to signal other threads in the warp about completion of the initialization. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56100
2018-12-28[OPENMP][NVPTX]Outline assert into noinline function, NFC.Alexey Bataev
Summary: At high optimization level asserts lead to some unexpected results because of auto-inserted unreachable instructions. This outlining prevents some of such dangerous optimizations and leads to better stability. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D56101
2018-12-10[OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form.Alexey Bataev
Summary: Use the original shuffle implementation for __kmpc_shuffle_int64 since default implementation uses the same implementation. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55514
2018-12-10[OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9.Alexey Bataev
Summary: Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the constant for the mask, need one extra L in the end. Reviewers: gtbercea, kkwli0 Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55440
2018-12-07[OPENMP][NVPTX]Save registers for optimized builds with enabled logging.Alexey Bataev
Summary: Introduced special noinline function log that allows to save some registers for optimized builds but with enabled logging. Also, it increases the stability of the optimized builds with inlined runtime. Reviewers: gtbercea, kkwli0 Reviewed By: gtbercea Subscribers: caomhin, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D55436
2018-12-06[OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 ↵Alexey Bataev
function. Summary: Explicitly casted printf's args to the required types + simplified shfl64 function. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55379
2018-12-06[OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block.Alexey Bataev
Summary: According to the standard, after memory flushing the changes in the memory must be visible to all the threads in all teams. Patch fixes this. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55370
2018-12-03[OpenMP][libomptarget] Flush intermediate values during team reductionGheorghe-Teodor Bercea
Summary: Ensure intermediate values of a team reduction are flushed to memory. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D55219
2018-11-30[OPENMP][NVPTX]Make runtime compatible with the original runtime.Alexey Bataev
Summary: Reworked runtime to make it compatible with the requirements of the original runtime library. Also, simplified some code to reduce number of function calls. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D55130
2018-11-27[OpenMP][libomptarget] Add new version of SPMD deinit kernel function with ↵Gheorghe-Teodor Bercea
argument Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument. Reviewers: ABataev, kkwli0, caomhin Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D54969
2018-11-27[OPENMP][NVPTX]Basic support for reductions across the teams.Alexey Bataev
Summary: Added functions __kmpc_nvptx_teams_reduce_nowait_simple and __kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support for reductions across the teams. Reviewers: gtbercea, kkwli0 Subscribers: guansong, jfb, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54967
2018-11-27[OpenMP][libomptarget] Refactor SPMD and runtime requirement checkingGheorghe-Teodor Bercea
Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54960
2018-11-20[OPENMP][NVPTX]Improved lock/critical constructs.Alexey Bataev
Summary: Improved support for critical constructs + omp_..._lock... constructs. Reviewers: gtbercea, kkwli0, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D54766
2018-11-08[OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping.Alexey Bataev
Summary: The base pointer for the lambda mapping must point to the lambda capture placement and pointer must point to the captured variable itself. Patch fixes this problem. Reviewers: gtbercea Subscribers: guansong, openmp-commits, kkwli0, caomhin Differential Revision: https://reviews.llvm.org/D54260
2018-11-02[OPENMP][OFFLOADING]Change the lambda capturing flags.Alexey Bataev
Summary: The previously used combination `PTR_AND_OBJ | PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ | LITERAL`. Reviewers: gtbercea Subscribers: guansong, caomhin, openmp-commits Differential Revision: https://reviews.llvm.org/D54035
2018-11-02[OPENMP][NVPTX]Fixed/improved support for globalization in team contexts.Alexey Bataev
Summary: Current globalization scheme works correctly only for SPMD+lightweight runtime mode and does not work for full runtime. Patch improves support for the globalization scheme + reduces global memory consumption in lightweight runtime mode. Patch adds runtime functions to work with the statically allocated global memory. It allows to improve performance and memory consumption. This global memory must be allocated by the compiler. Reviewers: grokos, kkwli0, gtbercea, caomhin Subscribers: guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53943
2018-11-01[OpenMP][libomptarget] Add runtime function for pushing coalesced global recordsGheorghe-Teodor Bercea
Summary: In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged. Reviewers: ABataev, grokos, caomhin Reviewed By: ABataev, grokos Subscribers: jholewinski, cfe-commits, Hahnfeld, guansong, jfb, openmp-commits Differential Revision: https://reviews.llvm.org/D53141
2018-10-30[LIBOMPTARGET] Add support for mapping of lambda captures.Alexey Bataev
Summary: Added support for correct mapping of variables captured by reference in lambdas. That kind of mapping may appear only in target-executable regions and must follow the original lambda or another lambda capture for the same lambda. The expected data: base address - the address of the lambda, begin pointer - pointer to the address of the lambda capture, size - size of the captured variable. When OMP_TGT_MAPTYPE_PTR_AND_OBJ mapping type is seen in target-executable region, the target address of the last processed item is taken as the address of the original lambda `tgt_lambda_ptr`. Then, the pointer to capture on the device is calculated like `tgt_lambda_ptr + (host_begin_pointer - host_begin_base)` and the target-based address of the original variable (which host address is `*(void**)begin_pointer`) is written to that pointer. Reviewers: kkwli0, gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D51107
2018-10-01[libomptarget-nvptx] Enable asserts in bclibJonas Hahnfeld
If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in the bitcode library. Everything else will have very unpleasent effects because asserts will appear when falling back to the static library libomptarget-nvptx.a. Differential Revision: https://reviews.llvm.org/D52701
2018-10-01[libomptarget-nvptx] reduction: Determine if runtime uninitializedJonas Hahnfeld
Pass in the correct value of isRuntimeUninitialized() which solves parallel reductions as reported on the mailing list. For reference: r333285 did the same for loop scheduling. Differential Revision: https://reviews.llvm.org/D52725
2018-09-30[libomptarget-nvptx] Align data sharing stackJonas Hahnfeld
NVPTX requires addresses of pointer locations to be 8-byte aligned or there will be an exception during runtime. This could happen without this patch as shown in the added test: getId() requires 4 byte of stack and putValueInParallel() uses 16 bytes to store the addresses of the captured variables. Differential Revision: https://reviews.llvm.org/D52655
2018-09-30[libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD)Jonas Hahnfeld
According to OpenMP 4.5, p250:12-14: If the requested nest level is outside the range of 0 and the nest level of the current thread, as returned by the omp_get_level routine, the routine returns -1. The SPMD code path will need a similar fix. Differential Revision: https://reviews.llvm.org/D51787
2018-09-29[libomptarget-nvptx] Add tests for nested parallelismJonas Hahnfeld
Clang trunk will serialize nested parallel regions. Check that this is correctly reflected in various API methods. Differential Revision: https://reviews.llvm.org/D51786
2018-09-29[libomptarget-nvptx] Ignore calls to dynamic APIJonas Hahnfeld
There is no support and according to the OpenMP 4.5, p238:7-9: For implementations that do not support dynamic adjustment of the number of threads this routine has no effect: the value of dyn-var remains false. Add a test that cancellation and nested parallelism aren't supported either. Differential Revision: https://reviews.llvm.org/D51785
2018-09-29[libomptarget-nvptx] Fix number of threads in parallelJonas Hahnfeld
If there is no num_threads() clause we must consider the nthreads-var ICV. Its value is set by omp_set_num_threads() and can be queried using omp_get_max_num_threads(). The rewritten code now closely resembles the algorithm given in the OpenMP standard. Differential Revision: https://reviews.llvm.org/D51783
2018-09-28[OPENMP] Add the test to check that the libomptarget does not causeAlexey Bataev
infinite loop on removing non-mapped pointer-with-object. Added test to check that libomptarget does not cause infinite loop when trying to unmap the pointer-with-object data that was not previously mapped.
2018-09-28[libomptarget-nvptx] Add testing infrastructureJonas Hahnfeld
This patch also introduces testing for libomptarget-nvptx which has been missing until now. I propose to add tests for all bugs that are fixed in the future. The target check-libomptarget-nvptx is not run by default because - we can't determine if there is a GPU plugged into the system. - it will require the latest Clang compiler. Keeping compatibility with older releases would prevent testing newer code generation developed in trunk. Differential Revision: https://reviews.llvm.org/D51687
2018-09-25[OpenMP][libomptarget] Set the frame pointer then test empty slot conditionGheorghe-Teodor Bercea
Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D52122
2018-09-25[OpenMP][libomptarget] Simplify warp master selection for data sharingGheorghe-Teodor Bercea
Summary: There is currently no supported situation where the warp master is not the first thread in the warp. This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50188
2018-09-21[OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD ↵Alexey Bataev
constructs with lightweight runtime. Summary: We need the support for per-team shared variables to support codegen for lastprivates/reductions. Patch adds this support by using shared memory if the total size of the reductions/lastprivates is <= 128 bytes, then pre-allocated buffer in global memory if size is <= 4K bytes,or uses malloc/free, otherwise. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51875
2018-09-11[OPENMP]Increment iterator when the loop is continued.Alexey Bataev
Summary: Missed operation of the incrementing iterator when required just to continue execution. Reviewers: kkwli0, gtbercea, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51937
2018-09-08[libomptarget-nvptx] Remove last mentions of __kmpc_print_*Jonas Hahnfeld
Their implementation was removed during review, delete their prototype declarations.
2018-09-06[libomptarget] Remove two unneeded includes, NFCI.Jonas Hahnfeld
Follow-up to r340542 and r340767.
2018-09-05[libomptaret][test] Announce compiler featuresJonas Hahnfeld
This is a follow-up to r341371: The new test for PR38704 doesn't work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that hasn't worked because the compiler features weren't known to lit.
2018-09-04[libomptarget] Remove `Devices` from `RTLInfoTy`Sergey Dmitriev
This patch removes unused field `Devices` from `RTLInfoTy`. Differential Revision: https://reviews.llvm.org/D51653
2018-09-04[libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI.Jonas Hahnfeld
cuDeviceGetProperties has apparently been deprecated since CUDA 5.0. Nvidia started using annotations only in CUDA 9.2, so nobody noticed nor cared before. The new function returns the same values, tested with a P100. Differential Revision: https://reviews.llvm.org/D51624
2018-09-04[libomptarget] PR38704: Fix erase of ShadowPtrMapJonas Hahnfeld
erase() invalidates the iterator and returns a new one pointing to the following element. The code now follows the example at https://en.cppreference.com/w/cpp/container/map/erase. (The added testcase crashes without this patch.) Reported by David Binderman (https://llvm.org/PR38704)! Differential Revision: https://reviews.llvm.org/D51623
2018-09-04[libomptarget][NVPTX] Drop dead code and data structures, NFCI.Jonas Hahnfeld
* cg and HasCancel in WorkDescr were never read and can be removed. * This eliminates the last use of priv in ThreadPrivateContext. * CounterGroup is unused afterwards. * Remove duplicate external declares in omptarget-nvptx.cu that are already in the header omptarget-nvptx.h. Differential Revision: https://reviews.llvm.org/D51622
2018-09-03[libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinitJonas Hahnfeld
If the runtime is uninitialized the master thread must Enqueue the state object, and ALL threads must return immediately. Found post-commit of https://reviews.llvm.org/D51222.
2018-08-29[OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC.Alexey Bataev
Required to fix the buildbots.
2018-08-29[OPENMP][NVPTX] Lightweight runtime support for SPMD mode.Alexey Bataev
Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222