summaryrefslogtreecommitdiff
path: root/parallel-libs
AgeCommit message (Collapse)Author
2019-01-15Update year in license filesHans Wennborg
In last year's update (D48219) it was suggested that the release manager might want to do this, so here we go.
2018-06-18Update copyright year to 2018.Paul Robinson
2016-12-19[Axccel] Remove -Wno-missing-braces in buildJason Henline
Summary: I originally added the -Wno-missing-braces flag because I thought it was erroneously flagging std::array initializations. Now I realize the extra braces really are desired for these initializations, so I'm turning the warning flag back on. Reviewers: jlebar Subscribers: mgorny, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D27941
2016-10-28[Acxxel] Remove setActiveDeviceForThreadJason Henline
Summary: After experimenting with CUDA, I realized that we really only need to set the active context right before creating an object such as a stream or a device memory allocation. When we go on to use these objects later, it is fine if the context that created them is no longer active, operations with those objects will succeed anyway. Since it turns out that we don't have to check the active context for every operation, it makes sense to hide this active context from users (by removing the "ActiveDeviceForThread" setter and getter) and to change the Acxxel API to explicitly pass in the device ID to create objects. This change improves the Acxxel API and greatly simplifies the CUDA and OpenCL implementations because they no longer require thread_local data. Reviewers: jlebar, jprice Subscribers: mgorny, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D26050
2016-10-25[SE] Remove StreamExecutorJason Henline
Summary: The project has been renamed to Acxxel, so this old directory needs to be deleted. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, parallel_libs-commits, modocache Differential Revision: https://reviews.llvm.org/D25964
2016-10-25Initial check-in of Acxxel (StreamExecutor renamed)Jason Henline
Summary: Acxxel is basically a simplified redesign of StreamExecutor. Here are the major points where Acxxel differs from the current StreamExecutor design: * Acxxel doesn't support the kernel and kernel loader types designed for emission by the compiler to support type-safe kernel launches. For CUDA, kernels in Acxxel can be seamlessly launched using the standard CUDA triple-chevron kernel launch syntax that is available with clang and nvcc. For CUDA and OpenCL, kernel arguments can be passed in the old-fashioned way, as one array of pointers to arguments and another array of argument sizes. Although OpenCL doesn't get a type-safe kernel launch method, it does still get the benefit of all the memory management wrappers. In the future, clang may add support for triple-chevron OpenCL kernel launchs, or some other type-safe OpenCL kernel launch method. * Acxxel does not depend on any other code in LLVM, so it builds completely independently from LLVM. The goal will be to check in Acxxel and remove StreamExecutor, or perhaps to remove the old StreamExecutor and rename Acxxel to StreamExecutor, so I think Acxxel should be thought of as a new version of StreamExecutor, not as a separate project. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, modocache, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D25701
2016-09-27[SE] Change CoreTests target nameJason Henline
Summary: Call it StreamExecutorCoreTests in order to prevent collision with targets from other modules. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24949
2016-09-15[SE] Fix config bug with CUDA testsJason Henline
Summary: It turns out CMake errors out if a processed directory contains source files that are not used. This was causing an error with the CUDATest.cpp file when configuring StreamExecutor with the CUDA platform disabled. Moving CUDATest.cpp to its own directory fixes this problem. Reviewers: jlebar, jprice Subscribers: beanz, mgorny, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24618
2016-09-15[SE] Support CUDA dynamic shared memoryJason Henline
Summary: Add proper handling for shared memory arguments in the CUDA platform. Also add in unit tests for CUDA. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24596
2016-09-15[SE] Let users specify CUDA pathJason Henline
Summary: Add logic to allow users to specify the CUDA path at configuration time. Reviewers: jlebar Subscribers: beanz, mgorny, jlebar, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24580
2016-09-14[SE] Add CUDA platformJason Henline
Summary: Basic CUDA platform implementation and cmake infrastructure to control whether it's used. A few important TODOs will be handled in later patches: * Log some error messages that can't easily be returned as Errors. * Cache modules and kernels to prevent reloading them if someone tries to reload a kernel that's already loaded. * Tolerate shared memory arguments for kernel launches. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24538
2016-09-13[SE] Pack global dev handle addressesJason Henline
Summary: We were packing global device memory handles in `PackedKernelArgumentArray`, but as I was implementing the CUDA platform, I realized that CUDA wants the address of the handle, not the handle itself. So this patch switches to packing the address of the handle. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24528
2016-09-13Device doc says device is smallJason Henline
2016-09-13[SE] Platforms return Device valuesJason Henline
Summary: Platforms were returning Device pointers, but a Device is now basically just a pointer to an underlying PlatformDevice, so we will now just pass it around as a value. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24537
2016-09-13[SE] KernelSpec return best PTXJason Henline
Summary: Before, the kernel spec would only return PTX for exactly the requested compute capability. With this patch it will now return the PTX with the largest compute capability that does not exceed that requested compute capability. Reviewers: jlebar Subscribers: jprice, jlebar, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24531
2016-09-13[SE] Use real HostPlatformDevice for testingJason Henline
Summary: Replace uses of SimpleHostPlatformDevice in tests with HostPlatformDevice. Reviewers: jlebar Subscribers: jlebar, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24519
2016-09-13[SE] Host platform implementationJason Henline
Summary: This implementation does not currently support multiple concurrent streams, and it won't allow kernels to be launched with grids larger than one block or blocks larger than one thread. These limitations could be removed in the future by launching new threads on the host, but that is not done in this implementation. Reviewers: jlebar Subscribers: beanz, mgorny, jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24473
2016-09-13[SE] Add .clang-formatJason Henline
Summary: The .clang-tidy file is copied from the top-level LLVM source directory. Also fix warnings generated by clang-format: * Moved SimpleHostPlatformDevice.h so its header include guard could have the right format. * Changed signatures of methods taking llvm::Twine by value to take it by const ref instead. * Add "noexcept" to some move constructors and assignment operators. * Removed a bunch of places where single-statement loops and conditionals were surrounded with braces. (This was not found by the current clang-tidy, but with a local patch that I hope to upstream soon.) Reviewers: jlebar, jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24468
2016-09-13[SE] Stop using llvm-config --cxxflagsJason Henline
Summary: Build configuration was adding $(llvm-config --cxxflags) to the StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even for debug builds, and was making debugging difficult. The llvm-config call was originally introduced to handle the -fno-rtti flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM. This patch converts to using LLVM_ENABLE_RTTI and only adding the `-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS. I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will probably have to be done to support MSVC. Reviewers: jlebar Subscribers: beanz, jprice, parallel_libs-commits, mgorny Differential Revision: https://reviews.llvm.org/D24474
2016-09-12[SE] Clean up device and host memory slicesJason Henline
Summary: * Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to emphasize that the slicing is not done in place. * Change device memory slice function name from `drop_front` to `slice` in order to match the naming convention of `llvm::ArrayRef` and host memory slice. * Change the parameter names of host memory slice functions to `DropCount` and `TakeCount` to match device memory slice declarations. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24464
2016-09-12[SE] RegisteredHostMemory for async device copiesJason Henline
Summary: Improve the error-prone interface that allows users to pass host pointers that haven't been registered to asynchronous copy methods. In CUDA, this is an extremely easy error to make, and instead of failing at runtime, it succeeds and gives the right answers by turning the async copy into a sync copy. So, you silently get a huge performance degradation if you misuse the old interface. This new interface should prevent that. Reviewers: jlebar Subscribers: jprice, beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24353
2016-09-09[SE] Remove Utils directoryJason Henline
Summary: There is no purpose in splitting out the Error class from the rest of the StreamExecutor code. This organization was just a vestige of an old failed design. Plus, this change fixes a bug in the build where the utilites library was not being statically linked in with libstreamexecutor. Reviewers: jlebar, jprice Subscribers: beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24434
2016-09-09[StreamExecutor] Make SE work with an in-tree LLVM build.Justin Lebar
Summary: With these changes, we can put parallel-libs within llvm/projects and build as normal. This is kind of the minimal change I could figure out how to make while still making us compatible with llvm's build system. Some things I'm not thrilled about include: * The creation of a CoreTests directory (the macros really seemed to want this) * Pulling SimpleHostPlatformDevice.h into CoreTests. It seems to me this should live inside unittests/include, or maybe tests/include, but I didn't want to make that change in this patch. One important piece of work that remains to be done is to make $ ninja check-streamexecutor run all the tests. Right now the only way I've figured out to run the tests is $ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests $ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests Reviewers: jhen Subscribers: beanz, parallel_libs-commits, jprice Differential Revision: https://reviews.llvm.org/D24368
2016-09-08Add streamexecutor-configJason Henline
Summary: Similar to llvm-config, gets command-line flags that are needed to build applications linking against StreamExecutor. Reviewers: jprice, jlebar Subscribers: beanz, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24302
2016-09-07[SE] Add getName method to Device classJason Henline
Reviewers: jhen Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24240
2016-09-06[SE] Rename PlatformInterfaces to PlatformDeviceJason Henline
Summary: The only interface that we ever plan to have in this file is PlatformDevice, so it makes sense to rename the file to reflect that. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24269
2016-09-06[SE] Remove Platform*Handle classesJason Henline
Summary: As pointed out by jprice, these classes don't serve a purpose. Instead, we stay consistent with the way memory is managed and let the Stream and Kernel classes directly hold opaque handles to device Stream and Kernel instances, respectively. Reviewers: jprice, jlebar Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24213
2016-09-03[SE] Add getByteCount methods for device memoryJason Henline
Summary: Simple utility methods will prevent users from making mistakes when converting element counts to byte counts. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24197
2016-09-02[SE] Remove broken doc refJason Henline
2016-09-02[SE] Doc tweaksJason Henline
Summary: * Sections on main page. * Use std algorithm for equality check in example. * Add tree view on left side. * Add extra CSS sheet to restrict content width. * Add mild background color. * Restrict alphabetic indexes to 1 column. * Round corners of content boxes. * Rename example to CUDASaxpy.cpp. * Add CUDASaxpy.cpp to "Examples" section. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24198
2016-09-02[SE] GlobalDeviceMemory owns its handleJason Henline
Summary: Final step in getting GlobalDeviceMemory to own its handle. * Make GlobalDeviceMemory movable, but no longer copyable. * Make Device::freeDeviceMemory function private and make GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase can free its memory in its destructor. * Make GlobalDeviceMemory constructor private and make Device a friend so it can construct GlobalDeviceMemory. * Remove SharedDeviceMemoryBase class because it is never used. * Remove explicit memory freeing from example code. This change just consumes any errors generated during device memory freeing. The real error handling will be added in a future patch. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24195
2016-09-02[SE] Add "install" actions to cmake buildJason Henline
The "install" build target will now copy the StreamExecutor library and headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX.
2016-09-02[SE] Don't pack raw device mem argsJason Henline
Summary: Step 4 of getting GlobalDeviceMemory to own its handle. Take out code to pack untyped device memory types as kernel arguments. When GlobalDeviceMemory owns its handle, users will never touch untyped device memory types, so they will never pass them as kernel args. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24177
2016-09-02[StreamExecutor] Pass device memory by refJason Henline
Summary: Step 3 of getting GlobalDeviceMemory to own its handle. Since GlobalDeviceMemory will no longer by copy-constructible, we must pass instances by reference rather than by value. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24172
2016-09-02[SE] Make Kernel movableJason Henline
Summary: Kernel is basically just a smart pointer to the underlying implementation, so making it movable prevents having to store a std::unique_ptr to it. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24150
2016-09-01[StreamExecutor] Read dev array directly in testJason Henline
Summary: Step 2 of getting GlobalDeviceMemory to own its handle. Use the SimpleHostPlatformDevice allocate methods to create device arrays for tests, and check for successful copies by dereferncing the device array handle directly because we know it is really a host pointer. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24148
2016-09-01[StreamExecutor] Dev handles in platform interfaceJason Henline
Summary: This is the first in a series of patches that will convert GlobalDeviceMemory to own its device memory handle. The first step is to remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and use raw handles there instead. This is useful because GlobalDeviceMemoryBase is going to lose its importance in this process. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24114
2016-09-01[SE] Make Stream movableJason Henline
Summary: The example code makes it clear that this is a much better design decision. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24142
2016-09-01[SE] Docs use JAVADOC_AUTOBRIEFJason Henline
That way we don't have to explicitly annotate each brief description as \brief.
2016-08-31[StreamExecutor] getOrDie and dieIfError utilsJason Henline
Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24107
2016-08-31Exclude examples, unittests from doc genJason Henline
Public documentation shouldn't be generated for unit test code and code that is only meant to be used as snippets in other documentation.
2016-08-31[StreamExecutor] Add Doxygen main pageJason Henline
Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24066
2016-08-31[StreamExecutor] Add Stream::blockHostUntilDoneJason Henline
Summary: Add the type-safe wrapper to the platform-specific implementation. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24063
2016-08-30[StreamExecutor] Simplify Kernel classesJason Henline
Summary: Make the Kernel class follow the pattern of the other classes. It now has a type-safe user wrapper and a typeless, platform-specific handle. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D24043
2016-08-26[StreamExecutor] Fix KernelSpec DoxygenJason Henline
Summary: There was a typo where \endcode was spelled as \encode and it was keeping the whole file document from rendering. I also added in some \c annotations for inline code stuff to make it look nicer. Reviewers: jprice Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23941
2016-08-25[StreamExecutor] Add Platform and PlatformManagerJason Henline
Summary: Abstractions for a StreamExecutor platform Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23857
2016-08-24[StreamExecutor] Rename Executor to DeviceJason Henline
Summary: This more clearly describes what the class is. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23851
2016-08-24[StreamExecutor] Fix allocateDeviceMemoryJason Henline
Summary: The return value from PlatformExecutor::allocateDeviceMemory needs to be converted from Expected<GlobalDeviceMemoryBase> to Expected<GlobalDeviceMemory<T>> in Executor::allocateDeviceMemory. A similar bug is also fixed for Executor::allocateHostMemory. Thanks to jprice for identifying this bug. Reviewers: jprice, jlebar Subscribers: parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23849
2016-08-24[StreamExecutor] Clean up device copy commentsJason Henline
Summary: Consolidate Executor::synchronousCopy* and Stream::thenCopy* methods into Doxygen method groups and combine all their comments into one section. Also a "doc" target to the build files to use Doxygen to build the documentation. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23845
2016-08-24[StreamExecutor] Executor add synchronous methodsJason Henline
Summary: Add Executor methods that block the host until completion. Since these methods are host-synchronous, they don't require Stream arguments. Reviewers: jlebar Subscribers: jprice, parallel_libs-commits Differential Revision: https://reviews.llvm.org/D23577