aboutsummaryrefslogtreecommitdiff
path: root/Documentation/lto-build
diff options
context:
space:
mode:
authorAndi Kleen <ak@linux.intel.com>2014-04-02 21:49:27 +0200
committerMichal Marek <mmarek@suse.cz>2014-04-07 21:51:13 +0200
commit19a3cc83353e3bb4bc28769f8606139a3d350d2d (patch)
treeb58a21b5ec4880b56dd4005cb9cfcf8c2f8a2821 /Documentation/lto-build
parent810361b9f65daa6144922ac88087a8426eeae817 (diff)
Kbuild, lto: Add Link Time Optimization support v3
With LTO gcc will do whole program optimizations for the whole kernel and each module. This increases compile time, but can generate faster and smaller code and allows the compiler to do global checking. For example the compiler can complain now about type mismatches for symbols between different files. LTO allows gcc to inline functions between different files and do various other optimization across the whole binary. It might also trigger bugs due to more aggressive optimizations. It allows gcc to drop unused code. It also allows it to check types over the whole program. The compile time is definitely slower. For gcc 4.8 on a typical monolithic config it is about 58% slower. 4.9 drastically improved performance, with slowdown being 38% or so. Also incremenential rebuilds are somewhat slower, as the whole kernel always needs to be reoptimized. Very modular kernels have less build time slow down, as the LTO will run for each module individually. This adds the basic Kbuild plumbing for LTO: - In Kbuild add a new scripts/Makefile.lto that checks the tool chain (note the checks may not be fully bulletproof) and when the tests pass sets the LTO options Currently LTO is very finicky about the tool chain. - Add a new LDFINAL variable that controls the final link for vmlinux or module. In this case we call gcc-ld instead of ld, to run the LTO step. - For slim LTO builds (object files containing no backup executable) force AR to gcc-ar - Theoretically LTO should pass through compiler options from the compiler to the link step, but this doesn't work for all options. So the Makefile sets most of these options manually. - Kconfigs: Since LTO with allyesconfig needs more than 4G of memory (~8G) and has the potential to makes people's system swap to death. I used a nested config that ensures that a simple allyesconfig disables LTO. It has to be explicitely enabled. - Some depencies on other Kconfigs: MODVERSIONS, GCOV, FUNCTION_TRACER, KALLSYMS_ALL, single chain WCHAN are incompatible with LTO currently, mostly because they they require setting special compiler options for specific files, which LTO currently doesn't support. MODVERSIONS should in principle work with gcc 4.9, but still disabled. FUNCTION_TRACER/GCOV can be fixed with a unmerged gcc patch. - Also disable strict copy user checks because they trigger errors with LTO. - modpost symbol checking is downgraded to a warning, as in some cases modpost runs before the final link and it cannot resolve LTO symbols at this point. For more information see Documentation/lto-build Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther, Don Zickus, Changlong Xie who helped with this project (and probably some more who I forgot, sorry) v2: Merge documentation file into this patch Improve documentation and Kconfig, fix a lot of obsolete comments. Exclude READABLE_ASM Some random fixes v3: Remove CONFIG_LTO_SLIM, is on by default. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
Diffstat (limited to 'Documentation/lto-build')
-rw-r--r--Documentation/lto-build173
1 files changed, 173 insertions, 0 deletions
diff --git a/Documentation/lto-build b/Documentation/lto-build
new file mode 100644
index 000000000000..5dcce1e9cc25
--- /dev/null
+++ b/Documentation/lto-build
@@ -0,0 +1,173 @@
+Link time optimization (LTO) for the Linux kernel
+
+This is an experimental feature.
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file. LTO requires at least gcc 4.8 (but
+works more efficiently with 4.9+) LTO requires Linux binutils (the normal FSF
+releases used in many distributions do not work at the moment)
+
+The compiler can inline functions between files and do various other global
+optimizations, like specializing functions for common parameters,
+determing when global variables are clobbered, making functions pure/const,
+propagating constants globally, removing unneeded data and others.
+
+It will also drop unused functions which can make the kernel
+image smaller in some circumstances, in particular for small kernel
+configurations.
+
+For small monolithic kernels it can throw away unused code very effectively
+(especially when modules are disabled) and usually shrinks
+the code size.
+
+Build time and memory consumption at build time will increase, depending
+on the size of the largest binary. Modular kernels are less affected.
+With LTO incremental builds are less incremental, as always the whole
+binary needs to be re-optimized (but not re-parsed)
+
+Oops can be somewhat more difficult to read, due to the more aggressive
+inlining.
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig may need more memory. The actual
+memory needed depends on the available memory (gcc sizes its garbage
+collector pools based on that or on the ulimit -m limits) and
+the compiler version.
+
+gcc 4.9+ has much better build performance and less memory consumption
+
+- A few kernel features are currently incompatible with LTO, in particular
+function tracing, because they require special compiler flags for
+specific files, which is not supported in LTO right now.
+- Jobserver control for -j does not work correctly for the final
+LTO phase due to some problems with the kernel's pipe code.
+The makefiles hard codes -j<number of online cpus> for the final
+LTO phase to work around for this
+
+Configuration:
+- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
+This is mainly to not have allyesconfig default to LTO.
+- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV
+have to disabled because they are currently incompatible with LTO.
+- MODVERSIONS have to be disabled (may work with 4.9+)
+
+Requirements:
+- Enough memory: 4GB for a standard build, more for allyesconfig
+The peak memory usage happens single threaded (when lto-wpa merges types),
+so dialing back -j options will not help much.
+
+A 32bit compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targeted at 32bit on a 64bit host.
+
+Example build procedure:
+
+Simplified procedure for distributions that have gcc 4.8, but not
+the Linux binutils (for example openSUSE 13.1 or FC20):
+
+The LTO builds requires gcc-nm/gcc-ar. Some distributions ship
+those in separate packages, which may need to be explicitely installed.
+
+- Get the latest Linux binutils from
+http://www.kernel.org/pub/linux/devel/binutils/
+and unpack it.
+
+We install it in a separate directory to not overwrite the system binutils.
+
+# replace VERSION with respective version numbers
+
+cd binutils*
+# don't forget the --enable-plugins!
+./configure --prefix=/opt/binutils-VERSION --enable-plugins
+make -j $(getconf _NPROCESSORS_ONLN) && sudo make install
+
+Fix up the kernel configuration to allow LTO:
+
+<start with a suitable kernel configuration>
+./source/scripts/config --disable function_tracer \
+ --disable function_graph_tracer \
+ --disable stack_tracer --enable lto_menu \
+ --disable lto_disable \
+ --disable gcov \
+ --disable kallsyms_all \
+ --disable modversions
+make oldconfig
+
+Then you can build with
+
+# The COMPILER_PATH is needed to let gcc use the new binutils
+# as the LTO plugin linker
+# if you installed gcc in a separate directory like below also
+# add it to the PATH line below before the regular $PATH
+# The COMPILER_PATH setting is only needed if the gcc was not built
+# with --with-plugin-ld pointing to the Linux binutils ld
+# The AR/NM setting works around a Makefile bug
+COMPILER_PATH=/opt/binutils-VERSION/bin PATH=$COMPILER_PATH:$PATH \
+make -j$(getconf _NPROCESSORS_ONLN) AR=gcc-ar NM=gcc-nm
+
+If you don't have gcc 4.8+ as system compiler you would also need
+to install that compiler. In this case I recommend getting
+a gcc 4.9+ snapshot from http://gcc.gnu.org (or release when available),
+as it builds much faster for LTO than 4.8.
+
+Here's an example build procedure:
+
+Assuming gcc is unpacked in gcc-VERSION
+
+cd gcc-VERSION
+./contrib/download_preqrequisites
+cd ..
+
+mkdir obj-gcc
+# please don't skip this cd. the build will not work correctly in the
+# source dir, you have to use the separate object dir
+cd obj-gcc
+../gcc-VERSION/configure --prefix=/opt/gcc-VERSION --enable-lto \
+--with-plugin-ld=/opt/binutils-VERSION/bin/ld
+--disable-nls --enable-languages=c,c++ \
+--disable-libstdcxx-pch
+make -j$(getconf _NPROCESSORS_ONLN)
+sudo make install-no-fixedincludes
+
+FAQs:
+
+Q: I get a section type attribute conflict
+A: Usually because of someone doing
+const __initdata (should be const __initconst) or const __read_mostly
+(should be just const). Check both symbols reported by gcc.
+
+Q: I see lots of undefined symbols for memcmp etc.
+A: Usually because NM=gcc-nm AR=gcc-ar are missing.
+The Makefile tries to set those automatically, but it doesn't always
+work. Better to set it manually on the make command line.
+
+Q: It's quite slow / uses too much memory.
+A: Consider a gcc 4.9 snapshot/release (not released yet)
+The main problem in 4.8 is the type merging in the single threaded WPA pass,
+which has been improved considerably in 4.9 by running it distributed.
+
+Q: It's still slow
+A: It'll always be somewhat slower than non LTO sorry.
+
+Q: What's up with .XXXXX numeric post fixes
+A: This is due LTO turning (near) all symbols to static
+Use gcc 4.9, it avoids them in most cases. They are also filtered out
+in kallsyms.
+
+References:
+
+Presentation on Kernel LTO
+(note, performance numbers/details outdated. In particular gcc 4.9 fixed
+most of the build time problems):
+http://halobates.de/kernel-lto.pdf
+
+Generic gcc LTO:
+http://www.ucw.cz/~hubicka/slides/labs2013.pdf
+http://www.hipeac.net/system/files/barcelona.pdf
+
+Somewhat outdated too:
+http://gcc.gnu.org/projects/lto/lto.pdf
+http://gcc.gnu.org/projects/lto/whopr.pdf
+
+Happy Link-Time-Optimizing!
+
+Andi Kleen