Community
Participate
Working Groups
CDT indexer does not scale well to code bases consisting of tens of thousands or even hundreds of thousands of files. Resources of a single machine are not sufficient to index such a huge code base in a reasonable time. The issue is partially alleviated by the fact that users working with very large code bases usually have much more narrow focus area (few thousands files), for which CDT indexing is adequate. The problem arises when users need to browse or navigate code outside of their focus area. Today they have to resort to simple text search and lose F3 navigation, search for references, call hierarchy, etc. To facilitate working with very large code bases some companies build code indexing facilities that can index very large code bases. It would be beneficial to be able to extend CDT code index by contributions based on external code indexing systems. Few basic assumptions about the external index contributions: 1. Information about the code in an external index may not be as detailed as the one in PDOM and may not be suitable for binding resolution. The information should be detailed enough for F3 navigation, search for references and call and type hierarchies. 2. Retrieval of information from an external index may be slow compared to PDOM, but not to a degree that would make code browsing unacceptably slow. 3. In cases when the same source or header file is indexed by both, PDOM and an external index, the more accurate PDOM version should be used.
Created attachment 202261 [details] Proposed implementation
I don't like the part about switching fragments on or off. Would it work for you, if the clients that need to use the additional fragments construct a CIndex that contains these fragments and other clients construct a CIndex without those? We could introduce an addtional flag for IndexFactory.getIndex(...).
Created attachment 203535 [details] Proposed implementation v2
(In reply to comment #2) Thanks for your suggestion. It's much cleaner this way.
Just noticed that the implementation of CIndex.findNames(...) causes problems. It has to remove duplicates that can occur because a file is part of multiple fragments. However, the new implementation does not check whether the file in a fragment has actually content. Furthermore it is desirable to allow names from the same file in a different variant. My suggestion: Let's track {fileLocation, linkage, fileOffset} of each name added and don't add duplicates?
(In reply to comment #5) > Just noticed that the implementation of CIndex.findNames(...) causes problems. > It has to remove duplicates that can occur because a file is part of multiple > fragments. However, the new implementation does not check whether the file in a > fragment has actually content. Furthermore it is desirable to allow names from > the same file in a different variant. > My suggestion: Let's track {fileLocation, linkage, fileOffset} of each name > added and don't add duplicates? The assumption is that extension fragments may contain slightly out of date versions of the same files present in the PDOM fragments. This means that, at least in the most common single-variant case, their content should be hidden completely, not on name by name basis. Extending this to to multi-variant case is a challenge, but, at least in my case, extension fragments will not contain multiple variants of includes. It seems a reasonable approximation to suppress a file from extension fragments of any variant of that file is present in any of the PDOM fragments. What is your opinion?
(In reply to comment #6) > ... > The assumption is that extension fragments may contain slightly out of date > versions of the same files present in the PDOM fragments. This means that, at > least in the most common single-variant case, their content should be hidden > completely, not on name by name basis. Extending this to to multi-variant case > is a challenge, but, at least in my case, extension fragments will not contain > multiple variants of includes. It seems a reasonable approximation to suppress > a file from extension fragments of any variant of that file is present in any > of the PDOM fragments. What is your opinion? The per-file comparison seems to be reasonable for your extension-fragments. When a fragment is the PDOM for a project (and you can have multiple of such in one index) the comparison should be on the per-name basis. I also think that the PDOM based SDKs should be treated on the per-name basis. ==> I think we should distinguish the two kinds of fragments. ==> In any case, a file without content must be treated as not present in a fragment.
(In reply to comment #7) > The per-file comparison seems to be reasonable for your extension-fragments. > When a fragment is the PDOM for a project (and you can have multiple of such in > one index) the comparison should be on the per-name basis. I also think that > the PDOM based SDKs should be treated on the per-name basis. > ==> I think we should distinguish the two kinds of fragments. > ==> In any case, a file without content must be treated as not present in a > fragment. Agree. I'll fix that.
(In reply to comment #7) > ==> I think we should distinguish the two kinds of fragments. It turns out I don't need this distinction since the file shadowing can be completely encapsulated inside the implementation of the non-PDOM fragment.
*** cdt git genie on behalf of Sergey Prigogin *** Bug 355991. Duplicate name detection based on {fileLocation, linkage, fileOffset} tuple. [*] http://git.eclipse.org/c/cdt/org.eclipse.cdt.git/commit/?id=5a1f9d47a9f98df3ef0b47663a3130485a8a976d
(In reply to comment #9) > (In reply to comment #7) > > ==> I think we should distinguish the two kinds of fragments. > It turns out I don't need this distinction since the file shadowing can be > completely encapsulated inside the implementation of the non-PDOM fragment. Thanks for taking care.
Was fixed in November.