When referring to term numbers, Lucene's current
implementation uses a Java int, which means the maximum
number of unique terms in any single index segment is
2,147,483,648 [8]. This is technically not a limitation of the
index file format, just of Lucene's current implementation.
Similarly, Lucene uses a Java int to refer to document
numbers, and the index file format uses an Int32 on-disk to
store document numbers. This is a limitation of both the index
file format and the current implementation. Eventually these
can be replaced with either UInt64 values, or better yet, VInt
values which have no limit.