| Summary: | Linkage uses a hard-coded enum constant instead of an extension | ||
|---|---|---|---|
| Product: | [Tools] CDT | Reporter: | Alex Blewitt <alex.blewitt> |
| Component: | cdt-core | Assignee: | Project Inbox <cdt-core-inbox> |
| Status: | REOPENED --- | QA Contact: | Jonah Graham <jonah> |
| Severity: | major | ||
| Priority: | P3 | CC: | mikekucera, nyssen, yevshif |
| Version: | 6.0 | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 315539 | ||
|
Description
Alex Blewitt
The tools that work on top of an AST have a right to understand what language they are looking at. An enumeration is a clear piece of information compared to some id or a label. Several clients would have to try to guess the language of an IASTTranslationUnit via parsing an id/label. It is not hard to add a constant to the enumeration, see bug 265748. I have no objection to the tools understanding what language that they are working with. However, an enumeration is (by definition) a closed set and therefore can't be extended without updates to the source code of the base platform. The fact that it's possible to extend the set at a later stage is the key part here - having this represented as a String identifier would permit other languages to build upon it without having to have changes made upstream. There is nothing that would prevent an ADT processor not knowing what a language it knows about with a string identifier, so the comment makes no sense. You either know what it is, or you don't - no need to guess. This is just one of many things which makes extending CDT more difficult than it needs to be. If the goal of CDT is to be actively hostile to such extensions, then please close this as WONTFIX again. On the other hand, if you want to make CDT more extensible and open to other extensions, leave this open. (In reply to comment #2) > ... > This is just one of many things which makes extending CDT more difficult than > it needs to be. If the goal of CDT is to be actively hostile to such > extensions, then please close this as WONTFIX again. > ... I don't see how the enumeration prevents you from writing an ObjectiveC parser, you already have an enumerator for ObjectiveC. I certainly did not want to express any sort of hostility. (In reply to comment #3) > (In reply to comment #2) > > ... > > This is just one of many things which makes extending CDT more difficult than > > it needs to be. If the goal of CDT is to be actively hostile to such > > extensions, then please close this as WONTFIX again. > > ... > > I don't see how the enumeration prevents you from writing an ObjectiveC parser, > you already have an enumerator for ObjectiveC. I certainly did not want to > express any sort of hostility. Right, but by definition, an enum is a closed set that requires changes upstream before it can be used. So for Objective-C, this may no longer be an issue, but for Next Great C-based Language (Go?) it will be. Furthermore, it essentially locks out the ability to use Eclipse until Eclipse+1 comes out with the new linkage in. I found in the initial stages of ObjectivEClipse that I had to ship patched versions of CDT just so that constant was defined. The reason for raising this (and other bugs) were real problems that I ran into whilst (trying to) develop Objective-C support. It will bite anyone who tries to do another language dependent on the CDT infrastructure. I also don't see a need for such a restriction, when using a generic String instead of an an int would permit an open set of results. One could use "C", "CPP" and "ObjC", for example - and then Go could be added by using "Go" as a string later without having to file a patch and then wait for the next stable (or release) version to come out. And given that Strings are intern'd in Java, it doesn't take up significantly much more memory than an int would (both will have 32 bit instance references, or a compressed OOP for a 64-bit JVM) but it is extensible without having to change the CDT core. My point is that Objective-C could work because you can tweak the existing parsers to add it in. But if you have a new language that requires a new parser, then it's too much work, if not impossible to add it in. The CDT AST infrastructure is very tied to the parsing style we used. And I don't expect other people to write parsers that way. So while I had hoped that we created the CDT DOM (AST, Binding, and Index) so that it can be used by other languages, the optimizations we've done over the years broke that, if it was ever properly built to begin with. So for other new languages, we need to create a new multi-language framework to support them. Xtext could be that but you need the full power of ANTLR to pull of common programming languages. Maybe they'll get there. (In reply to comment #5) I think we need to be a lot more clear about what it means to have CDT be extensible for a new language. Firstly I don't think parser extensibility is even close to being the hardest part. I was able to make the LR parser extensible by simply providing reusable grammar files and action classes. Parsing produces an AST from some text, fine, now what do you do with that AST? That's the important question. I'm talking about stuff like the binding resolution algorithms which are incredibly complex. They basically encode most of the semantic rules of C/C++. How do you make something like that extensible? I see two levels... One way would be to plug in at a fine grained level. This might be useful for small language extensions like UPC which tend to only provide a few new things on top of C. If I could just extend a parser with a few grammar rules, add a few semantic rules, a handful of AST nodes etc. Unfortunately I don't think this would work without extensive architectural and algorithmic changes to CDT. And introducing API at that level makes evolving the CDT core much harder. UPC does barely work though. The UPC parser represents UPC constructs in way that is digestible to CDT by "reducing" the new language features to ones CDT already supports. For example the UPC forall loop extends a regular C for loop, which isn't actually correct but makes it work. That's about as far as I got, anything else was big trouble. I even gave up trying to get "shared int" to show up in the outline view. Another example is that the editor help system doesn't even support using a content type other than C/C++. My point is that sweeping changes across the core and UI would be needed to make this work, including introducing tons of API. Is there even demand for such a framework? I think its a non-starter. I could have just added UPC support directly to the core and probably would have gotten a lot farther. The other level would be very high level. Basically you provide an almost complete standalone solution that plugs in only at specific points like ILanguage. You provide a complete parser with binding resolution, AST, index linkage etc. Many parts of CDT could be reusable, specifically the preprocessor/lexer which already buys you a lot. But still there's a ton left to do. And then how do you extend the UI? For the sake of argument say there's some editor trick you could do in Obj-C that doesn't apply to C/C++, how do you extend the existing editor with that? How do you make the call hierarchy extend able to support all kinds of fancy stuff that can't be predicted like multi-methods. Do you have to provide your own editor and type/call hierarchy? All this stuff needs to be thought through. My opinion is that the tooling needs to know a lot about the language in order to provide powerful and useful features. So you're writing at least half an IDE from scratch to add a new language to CDT, if not more. That's why I think Markus is on point. The best approach would be to add objective-c directly to the CDT core as a third officially supported language. CDT is open source, anyone can provide patches, participate in the community, and work towards becoming a committer. I don't see why doing it this way makes CDT closed or hostile. And having ParserLanguage be an enum forces anyone truly serious about supporting a new language to work closely with the CDT community to do it. I suggest that we move the generic discussion of what-makes-it-easier to the parent bug 315539 rather than this one (which is just one instance of the problem). I currently ran into this problem as well. I am working on adding support for IAR language extensions to Standard-C and ANSI-C. I could already add support for the @-operator and IAR-defined keywords by defining my own language, where I configured the scanner and adjusted the parser accordingly. Now I am facing the problem that IAR also provides support for members in nested anonymous unions/structs, as I have outlined here: https://www.eclipse.org/forums/index.php/t/1070213/. I think I need to adjust the linking to deal with that and could finally locate the handling of anonymous members (of top-level structs/unions) inside PDOMCLinkage. I thus could (quite) easily add the respective support if I could replace PDOMCLinkage with my own linkage. The extension points CDT provide assume that this could be done in the context of an own language definition. I.e. my custom language can return a linkage ID, and I can register a PDOMLinkageFactory with the language extension. However, it seems that the relationship between the linkage and my language is never established by the CDT core, as the list of known linkages (that is used by the indexer infrastructure) is limited to those that are "hard-coded" in Linkage. The restriction that Linkage registration is based on a static predefined list seems to be a blocker to my language extension use case (see comment #8). While I can contribute my own language, linkage and indexer (to ensure that the linkage is considered during indexing), I am unable to "bind" the linkage to the language because of the implementation within PODM#createLinkage(int linkageID) falls back to the static list of Linkages contained in Linkage.getLinkedName(linkageID), so that the linkage factory will never be used. Without having Linkage evaluate the registered linkages dynamically, the pdomLinkageFactory entry in the language extension point is pretty much useless. |