diff options
author | Andi Kleen <ak@linux.intel.com> | 2014-04-02 21:49:27 +0200 |
---|---|---|
committer | Michal Marek <mmarek@suse.cz> | 2014-04-07 21:51:13 +0200 |
commit | 19a3cc83353e3bb4bc28769f8606139a3d350d2d (patch) | |
tree | b58a21b5ec4880b56dd4005cb9cfcf8c2f8a2821 /Documentation/lto-build | |
parent | 810361b9f65daa6144922ac88087a8426eeae817 (diff) |
Kbuild, lto: Add Link Time Optimization support v3
With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster and smaller code and allows
the compiler to do global checking. For example the compiler
can complain now about type mismatches for symbols between
different files.
LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.
It might also trigger bugs due to more aggressive optimizations.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.
The compile time is definitely slower. For gcc 4.8 on a
typical monolithic config it is about 58% slower. 4.9
drastically improved performance, with slowdown being
38% or so. Also incremenential rebuilds are somewhat
slower, as the whole kernel always needs to be reoptimized.
Very modular kernels have less build time slow down, as
the LTO will run for each module individually.
This adds the basic Kbuild plumbing for LTO:
- In Kbuild add a new scripts/Makefile.lto that checks
the tool chain (note the checks may not be fully bulletproof)
and when the tests pass sets the LTO options
Currently LTO is very finicky about the tool chain.
- Add a new LDFINAL variable that controls the final link
for vmlinux or module. In this case we call gcc-ld instead
of ld, to run the LTO step.
- For slim LTO builds (object files containing no backup
executable) force AR to gcc-ar
- Theoretically LTO should pass through compiler options from
the compiler to the link step, but this doesn't work for all options.
So the Makefile sets most of these options manually.
- Kconfigs:
Since LTO with allyesconfig needs more than 4G of memory (~8G)
and has the potential to makes people's system swap to death.
I used a nested config that ensures that a simple
allyesconfig disables LTO. It has to be explicitely
enabled.
- Some depencies on other Kconfigs:
MODVERSIONS, GCOV, FUNCTION_TRACER, KALLSYMS_ALL, single chain WCHAN are
incompatible with LTO currently, mostly because they
they require setting special compiler options
for specific files, which LTO currently doesn't support.
MODVERSIONS should in principle work with gcc 4.9, but still disabled.
FUNCTION_TRACER/GCOV can be fixed with a unmerged gcc patch.
- Also disable strict copy user checks because they trigger
errors with LTO.
- modpost symbol checking is downgraded to a warning,
as in some cases modpost runs before the final link
and it cannot resolve LTO symbols at this point.
For more information see Documentation/lto-build
Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)
v2:
Merge documentation file into this patch
Improve documentation and Kconfig, fix a lot of obsolete comments.
Exclude READABLE_ASM
Some random fixes
v3:
Remove CONFIG_LTO_SLIM, is on by default.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Diffstat (limited to 'Documentation/lto-build')
-rw-r--r-- | Documentation/lto-build | 173 |
1 files changed, 173 insertions, 0 deletions
diff --git a/Documentation/lto-build b/Documentation/lto-build new file mode 100644 index 000000000000..5dcce1e9cc25 --- /dev/null +++ b/Documentation/lto-build @@ -0,0 +1,173 @@ +Link time optimization (LTO) for the Linux kernel + +This is an experimental feature. + +Link Time Optimization allows the compiler to optimize the complete program +instead of just each file. LTO requires at least gcc 4.8 (but +works more efficiently with 4.9+) LTO requires Linux binutils (the normal FSF +releases used in many distributions do not work at the moment) + +The compiler can inline functions between files and do various other global +optimizations, like specializing functions for common parameters, +determing when global variables are clobbered, making functions pure/const, +propagating constants globally, removing unneeded data and others. + +It will also drop unused functions which can make the kernel +image smaller in some circumstances, in particular for small kernel +configurations. + +For small monolithic kernels it can throw away unused code very effectively +(especially when modules are disabled) and usually shrinks +the code size. + +Build time and memory consumption at build time will increase, depending +on the size of the largest binary. Modular kernels are less affected. +With LTO incremental builds are less incremental, as always the whole +binary needs to be re-optimized (but not re-parsed) + +Oops can be somewhat more difficult to read, due to the more aggressive +inlining. + +Normal "reasonable" builds work with less than 4GB of RAM, but very large +configurations like allyesconfig may need more memory. The actual +memory needed depends on the available memory (gcc sizes its garbage +collector pools based on that or on the ulimit -m limits) and +the compiler version. + +gcc 4.9+ has much better build performance and less memory consumption + +- A few kernel features are currently incompatible with LTO, in particular +function tracing, because they require special compiler flags for +specific files, which is not supported in LTO right now. +- Jobserver control for -j does not work correctly for the final +LTO phase due to some problems with the kernel's pipe code. +The makefiles hard codes -j<number of online cpus> for the final +LTO phase to work around for this + +Configuration: +- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE. +This is mainly to not have allyesconfig default to LTO. +- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV +have to disabled because they are currently incompatible with LTO. +- MODVERSIONS have to be disabled (may work with 4.9+) + +Requirements: +- Enough memory: 4GB for a standard build, more for allyesconfig +The peak memory usage happens single threaded (when lto-wpa merges types), +so dialing back -j options will not help much. + +A 32bit compiler is unlikely to work due to the memory requirements. +You can however build a kernel targeted at 32bit on a 64bit host. + +Example build procedure: + +Simplified procedure for distributions that have gcc 4.8, but not +the Linux binutils (for example openSUSE 13.1 or FC20): + +The LTO builds requires gcc-nm/gcc-ar. Some distributions ship +those in separate packages, which may need to be explicitely installed. + +- Get the latest Linux binutils from +http://www.kernel.org/pub/linux/devel/binutils/ +and unpack it. + +We install it in a separate directory to not overwrite the system binutils. + +# replace VERSION with respective version numbers + +cd binutils* +# don't forget the --enable-plugins! +./configure --prefix=/opt/binutils-VERSION --enable-plugins +make -j $(getconf _NPROCESSORS_ONLN) && sudo make install + +Fix up the kernel configuration to allow LTO: + +<start with a suitable kernel configuration> +./source/scripts/config --disable function_tracer \ + --disable function_graph_tracer \ + --disable stack_tracer --enable lto_menu \ + --disable lto_disable \ + --disable gcov \ + --disable kallsyms_all \ + --disable modversions +make oldconfig + +Then you can build with + +# The COMPILER_PATH is needed to let gcc use the new binutils +# as the LTO plugin linker +# if you installed gcc in a separate directory like below also +# add it to the PATH line below before the regular $PATH +# The COMPILER_PATH setting is only needed if the gcc was not built +# with --with-plugin-ld pointing to the Linux binutils ld +# The AR/NM setting works around a Makefile bug +COMPILER_PATH=/opt/binutils-VERSION/bin PATH=$COMPILER_PATH:$PATH \ +make -j$(getconf _NPROCESSORS_ONLN) AR=gcc-ar NM=gcc-nm + +If you don't have gcc 4.8+ as system compiler you would also need +to install that compiler. In this case I recommend getting +a gcc 4.9+ snapshot from http://gcc.gnu.org (or release when available), +as it builds much faster for LTO than 4.8. + +Here's an example build procedure: + +Assuming gcc is unpacked in gcc-VERSION + +cd gcc-VERSION +./contrib/download_preqrequisites +cd .. + +mkdir obj-gcc +# please don't skip this cd. the build will not work correctly in the +# source dir, you have to use the separate object dir +cd obj-gcc +../gcc-VERSION/configure --prefix=/opt/gcc-VERSION --enable-lto \ +--with-plugin-ld=/opt/binutils-VERSION/bin/ld +--disable-nls --enable-languages=c,c++ \ +--disable-libstdcxx-pch +make -j$(getconf _NPROCESSORS_ONLN) +sudo make install-no-fixedincludes + +FAQs: + +Q: I get a section type attribute conflict +A: Usually because of someone doing +const __initdata (should be const __initconst) or const __read_mostly +(should be just const). Check both symbols reported by gcc. + +Q: I see lots of undefined symbols for memcmp etc. +A: Usually because NM=gcc-nm AR=gcc-ar are missing. +The Makefile tries to set those automatically, but it doesn't always +work. Better to set it manually on the make command line. + +Q: It's quite slow / uses too much memory. +A: Consider a gcc 4.9 snapshot/release (not released yet) +The main problem in 4.8 is the type merging in the single threaded WPA pass, +which has been improved considerably in 4.9 by running it distributed. + +Q: It's still slow +A: It'll always be somewhat slower than non LTO sorry. + +Q: What's up with .XXXXX numeric post fixes +A: This is due LTO turning (near) all symbols to static +Use gcc 4.9, it avoids them in most cases. They are also filtered out +in kallsyms. + +References: + +Presentation on Kernel LTO +(note, performance numbers/details outdated. In particular gcc 4.9 fixed +most of the build time problems): +http://halobates.de/kernel-lto.pdf + +Generic gcc LTO: +http://www.ucw.cz/~hubicka/slides/labs2013.pdf +http://www.hipeac.net/system/files/barcelona.pdf + +Somewhat outdated too: +http://gcc.gnu.org/projects/lto/lto.pdf +http://gcc.gnu.org/projects/lto/whopr.pdf + +Happy Link-Time-Optimizing! + +Andi Kleen |