Add file comments for ScriptParser.cpp.

author: Rui Ueyama <ruiu@google.com> 2017-02-14 04:47:24 +0000
committer: Rui Ueyama <ruiu@google.com> 2017-02-14 04:47:24 +0000
commit: a66b0f4f021df4f88e17ef5c49e88059596d53e9 (patch)
tree: 727f9df3bc1607176241deed78d12cebb2d4d751 /lld/ELF/ScriptLexer.cpp
parent: 6758a6942eae016a8694bbe8d4f58f6990e1a42e (diff)
1 files changed, 31 insertions, 2 deletions
diff --git a/lld/ELF/ScriptLexer.cpp b/lld/ELF/ScriptLexer.cpp
index 6398a52a026..418ec93695f 100644
--- a/lld/ELF/ScriptLexer.cpp
+++ b/lld/ELF/ScriptLexer.cpp
@@ -7,8 +7,37 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// This file contains the base parser class for linker script and dynamic
-// list.
+// This file defines a lexer for the linker script.
+//
+// The linker script's grammar is not complex but ambiguous due to the
+// lack of the formal specification of the language. What we are trying to
+// do in this and other files in LLD is to make a "reasonable" linker
+// script processor.
+//
+// Among simplicity, compatibility and efficiency, we put the most
+// emphasis on simplicity when we wrote this lexer. Compatibility with the
+// GNU linkers is important, but we did not try to clone every tiny corner
+// case of their lexers, as even ld.bfd and ld.gold are subtly different
+// in various corner cases. We do not care much about efficiency because
+// the time spent in parsing linker scripts is usually negligible.
+//
+// Our grammar of the linker script is LL(2), meaning that it needs at
+// most two-token lookahead to parse. The only place we need two-token
+// lookahead is labels in version scripts, where we need to parse "local :"
+// as if "local:".
+//
+// Overall, this lexer works fine for most linker scripts. There's room
+// for improving compatibility, but that's probably not at the top of our
+// todo list.
+//
+// A caveat: This lexer splits an input string into tokens ahead of time,
+// so the lexer is not context aware. There's one known corner case. Let's
+// say the next string is "val*3" (without quotes). In the context where
+// the parser is expecting an expression, that should be tokenizes to
+// "val", "*" and "3". In other context, it should be just a single
+// token. (If it is in a filename context, it'll be interpeted as a glob
+// pattern, for example.)  We want to fix this, but it probably needs a
+// redesign of this lexer.
 //
 //===----------------------------------------------------------------------===//
author	Rui Ueyama <ruiu@google.com>	2017-02-14 04:47:24 +0000
committer	Rui Ueyama <ruiu@google.com>	2017-02-14 04:47:24 +0000
commit	a66b0f4f021df4f88e17ef5c49e88059596d53e9 (patch)
tree	727f9df3bc1607176241deed78d12cebb2d4d751 /lld/ELF/ScriptLexer.cpp
parent	6758a6942eae016a8694bbe8d4f58f6990e1a42e (diff)