[NVPTX] Turn on Loop/SLP vectorization

Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130
author: Benjamin Kramer <benny.kra@googlemail.com> 2018-04-27 13:36:05 +0000
committer: Benjamin Kramer <benny.kra@googlemail.com> 2018-04-27 13:36:05 +0000
commit: 323ba4e5f89de760193d4258952619c007541865 (patch)
tree: 606f869eaee86bf6b19082501c731cdfe13cb96d /llvm/lib/Target/NVPTX
parent: 66a1be9a3a4cb5b3f3b623591e84d4c4e70fda2d (diff)
1 files changed, 12 insertions, 0 deletions
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index d2414b72a00..812d305da18 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -49,6 +49,18 @@ public:
     return AddressSpace::ADDRESS_SPACE_GENERIC;
   }
 
+  // NVPTX has infinite registers of all kinds, but the actual machine doesn't.
+  // We conservatively return 1 here which is just enough to enable the
+  // vectorizers but disables heuristics based on the number of registers.
+  // FIXME: Return a more reasonable number, while keeping an eye on
+  // LoopVectorizer's unrolling heuristics.
+  unsigned getNumberOfRegisters(bool Vector) const { return 1; }
+
+  // Only <2 x half> should be vectorized, so always return 32 for the vector
+  // register size.
+  unsigned getRegisterBitWidth(bool Vector) const { return 32; }
+  unsigned getMinVectorRegisterBitWidth() const { return 32; }
+
   // Increase the inlining cost threshold by a factor of 5, reflecting that
   // calls are particularly expensive in NVPTX.
   unsigned getInliningThresholdMultiplier() { return 5; }
author	Benjamin Kramer <benny.kra@googlemail.com>	2018-04-27 13:36:05 +0000
committer	Benjamin Kramer <benny.kra@googlemail.com>	2018-04-27 13:36:05 +0000
commit	323ba4e5f89de760193d4258952619c007541865 (patch)
tree	606f869eaee86bf6b19082501c731cdfe13cb96d /llvm/lib/Target/NVPTX
parent	66a1be9a3a4cb5b3f3b623591e84d4c4e70fda2d (diff)