Community
Participate
Working Groups
I see that there is already some talk about content assist in Orion in Bug 344203 and Bug 343774. I thought I'd add some things to this discussion. Disclaimer: This is a brain dump of some ideas I have had regarding Javascript content assist (and type inferencing) in the editor. I have not looked at the current Orion content assist implementation yet and am not an expert in Javascript, but I have done similar work for Groovy in Eclipse. In order to get high quality content assist in Javascript, we need to know something about the object/identifier we are completing on and what sort of things it responds to. Hence, the foundation for content assist is some sort of type inferencing. As I see it, this type inferencing could/should be more generally useful than just for content assist. It could be used to populate semantically aware search, refactoring, and error detection, for example. So, from here on, I'll focus on what type inferencing could look like. There are two pieces to type inferencing: local inferencing that does some control flow analysis in the current file so that we have some idea about what the current identifier has been assigned to and what it can respond to. There will also need to be a kind of global index that stores type information about objects and variables available from all libraries in the project. First, local inferencing. This is relatively straight forward (glossing over many details, like a parser with good error recovery). Start with a parse tree from somewhere. Then walk it, keeping track of assignments and declarations. We can be smart and only walk areas that may affect the current scope. Every time there is an identifier that is not otherwise recognized, query the global index to see if we can resolve it. Second, indexing. This is a bit more open ended in my mind. Essentially, indexing will need to build up a store of global variables, functions, and objects. It will need to keep track of what sort of parameters these functions take, and what properties the objects have. I see what happens here as being a mix of actual parsing of files to extract information, and the use of a plugin system that defines domain specific extensions to the index (eg- a browser-specific DOM plugin, a jQuery plugin, a Dojo plugin, etc). Presumably, the indexing work can happen rarely (at project creation or whenever a file is added or changed) and the information is stored in a database somewhere that can be easily queried and updated. I could also imagine other ways of extracting type information, which can all be managed through various plugins: * parsing some well defined jsDoc that precisely describes the type information of an object or variable (of course, we'd need to work on a format that makes sense, and it would also be good to make sure that it is IDE agnostic) * allowing the user to make tweaks to the type information. Eg, providing some UI that allows a user to say "the 'response' variable responds to the 'send' method. It takes a string as its argument" * Using runtime analysis to inspect the running application and keep track of what objects and variables really respond to at runtime. * Accept some form of JSON that is created by hand or programmatically to augment the type information. I think this is pretty cool because this allows library developers to include this JSON file next to their library. Orion could be smart enough to recognize this and immediately be able to provide good domain specific content assist for that library. A few caveats and open questions: * Processing must not damage UI responsiveness. * Are the indexes stored locally or on the server? Where does the parsing work get done? * What should be stored in the index? Javascript makes heavy use of callbacks. Anonymous functions passed to the callbacks take parameters that respond to some well defined methods and properties. We need to keep track of that somehow. * Can this same kind of infrastructure be used for content assist/inferencing in other languages, specifically CSS and HTML5? (maybe these languages do not need something so fancy since they are much more static) * Since javascript is so dynamic, we will never be able to get 100% correctness. The best we can do is to minimize the false positives and the false negatives. But, how good is good enough? This is a big piece of work and so breaking up the task into smaller pieces that incrementally add functionality would be great. As a first step, something like this would be great to see: var x = "foo"; var y = x; y.<**> // get some interesting String proposals
Hi Andrew, This is a great summary. If you are looking at the type inferencing side I think that dovetails nicely with the very dumb content assist I have done so far. Essentially I broken content assist down into "member proposals" versus other forms of completion (variables, arguments, keywords, templates, etc). By member proposals I mean completion after a dot, such as "a.<**>". This kind of completion requires inferencing and I have barely touched on it. The other kind of completion, such as " a<**>" just requires building a list of available objects and variables in the current scope. This is the simple stuff I have been playing around with. The current content assist implementation is entirely in this script: http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/tree/bundles/org.eclipse.orion.client.editor/web/orion/editor/jsContentAssist.js You'll see around line 165 I have sketched out a space for inference-based completions: var type = inferType(prefix, buffer, selection); if (type === "String") { addPropertyProposals(stringProps, "String", prefix, selection, proposals); } ... etc for other types ... //properties common to all objects addPropertyProposals(objectProps, "Object", prefix, selection, proposals); I have defined a simple notation here for type information but I am certain it is not rich enough in its current form. In particular it should capture function argument and return types to enabled chained completions (a.foo().boo()). I know there is an existing JSON notation for JavaScript type information but I can't find it offhand... I'll dig around for it. Some other random comments: - I completely agree with your separation of indexing from the local inferencing that happens during completion. I was thinking that indexing doesn't require a recoverable parser because it can run in the background and only process syntactically correct files. For local inferencing during completion you of course have to recover from syntax errors. I have taken the approach of starting at the completion index and walk backwards. A handy property of JavaScript is that nothing after the cursor really matters for content completion and we can just walk up through the function scopes from our starting point and ignore the likely possibility that any of these functions might not have a closing brace yet (you can technically reference local variables declared *below* you in the current function scope but I don't think such proposals are useful in content assist). - Pluggable inferencing engines makes complete sense, and this is an approach that has worked well in JSDT and VJET (Ebay's fork of JSDT). It's possible we could even use some of their work if we did indexing on the server. - I completely agree with the incremental approach of starting with the simple stuff and working our way up to more complex cases.
Ok, rain predicted... If your goal is content assist then type-inference is a ton of complex work to get an imperfect answer. Content-assist asks "given the current state of the program, what are the valid completions to the current cursor position". Type inference answers by guessing the possible states given a pile of source. It seems so much simpler to me to just run the program and query the current state. Then you can focus on user experience of completion, which most of the time is less than perfect. Moreover once the editor is integrated with the runtime many new opportunities emerge, esp. around test-driven development and rapid incremental development. The total engineering investment in runtime integration is much less than that required for type inference; The risks in type-inference involve complex research issues while those in runtime integration involve UX and software engineering. On the other hand, new research in runtime integration features (eg querypoint debugging) provides promising upsides to that path.
@John A I had a brief look at what you did and I will play around with this over the next few days. I see the structure you are trying to achieve. @John B This is another interesting approach. I agree that there are quite a few difficulties in what I outlined. The biggest one IMO is: can we calculate all this information fast enough so that the user isn't twiddling thumbs each time '.' is pressed. The kind of type inferencing I am describing is perhaps less formal than the one you are talking about. I am not suggesting inter-procedural analysis (while editing). Control flow analysis based on assignment statements is what I was thinking. So, I don't think there is any real research required here. The tricky part would be figuring out what the index would look like, how to store it, and how to query it quickly. But, with your approach there are some things that I don't understand. 1. How would you deal with broken code? Most likely the code you are developing wouldn't compile, much less run without errors. 2. Even if you could execute the code up to the given content assist invocation location, then how are you sure that the state of the system approaches what it would be like during real runtime? I could see that there would be access to global variables, but what about if you are invoking content assist inside of a method with parameters? How would you know what they respond to (I'm not saying that the solution above solves this any better).
(In reply to comment #3) > > But, with your approach there are some things that I don't understand. > > 1. How would you deal with broken code? Most likely the code you are > developing wouldn't compile, much less run without errors. The runtime is always stopped at the first error. > 2. Even if you could execute the code up to the given content assist invocation > location, then how are you sure that the state of the system approaches what it > would be like during real runtime? Because it is the real runtime. A related question could be: how do you know that the test-driver running the current execution resembles the final production-driver? Well that is right there a pretty important thing to work out, something that a runtime-integration system focus will get to way sooner than one based on type inference. > I could see that there would be access to > global variables, but what about if you are invoking content assist inside of a > method with parameters? How would you know what they respond to (I'm not > saying that the solution above solves this any better). If you invoke content assist inside of a method you get the values that the arguments a bound to. Of course you don't get the values from all other invocations. So a second-generation solution would build a database of invocations to help you. This is similar so some of the things in the original post. But rather than speculative inferred values these are actual values.
I see what you're getting at and this could be a nice way of getting good support with perhaps less effort than my original proposal if we can use something like querypoint debugging in this area. I found: http://code.google.com/p/querypoint-debugging/ but it doesn't seem to be active right now and doesn't have any released code. Do you have any suggestions on what could be used in Orion? At this point, I am hitting a wall with my Javascript knowledge and I just need to read up on this. Perhaps this is obvious once I read up on this a bit, but let's say we have this code snippet, a very loose adaptation of how using the twitter api for a node app. var cookie = ... app.call('/auth/login', function(req, res) { var login = auth.login('/auth/twitter', '/app.html'); login(req, res, function() { var res = postATweet(); <**> } ); }); 1. how do we grab the anonymous function inside of the login call? 2. will the cookie variable be available? What about req and res and login? 3. will content assist make a tweet? 4. what if postATweet() throws an exception, does that prevent content assist from working? Perhaps the answer would be to stub out all external calls, but once you start doing that, you start loosing "real" runtime information. I like this idea, but I am just trying to wrap my head around how it could work.
(In reply to comment #5) > I see what you're getting at and this could be a nice way of getting good > support with perhaps less effort than my original proposal if we can use > something like querypoint debugging in this area. > > I found: http://code.google.com/p/querypoint-debugging/ but it doesn't seem to > be active right now and doesn't have any released code. Do you have any > suggestions on what could be used in Orion? Yes, that was Salman Mirghasemi's Firebug-based prototype for JS. I'm working on a Chrome based solution. > > At this point, I am hitting a wall with my Javascript knowledge and I just need > to read up on this. Perhaps this is obvious once I read up on this a bit, but > let's say we have this code snippet, a very loose adaptation of how using the > twitter api for a node app. > I think you can learn a lot from Firebug or Web Inspector: > var cookie = ... > > app.call('/auth/login', function(req, res) { > var login = auth.login('/auth/twitter', '/app.html'); \/---set a breakpoint on the next line > login(req, res, > function() { \/---and here > var res = postATweet(); > <**> > } > ); > }); > > 1. how do we grab the anonymous function inside of the login call? On the second breakpoint you will see the anonymous function on the call stack as well as the closure state on the scope chain. Also play around with the command line aka REPL. Both debuggers have rude completion tools and you can explore some of the issues > 2. will the cookie variable be available? What about req and res and login? Yes, in the closure state > 3. will content assist make a tweet? I guess you are asking: how can we exercise APIs with real-world consequences in a debug-world? A colleague is working on a "debugging proxy" that makes the front end code think it is talking the real Internet world. Lots of things to work out but the basic idea is to run the test case once, then replay from the proxy. > 4. what if postATweet() throws an exception, does that prevent content assist > from working? How you move from one execution point to the next (step forward, throw) does not affect how the system works when stopped on a point. > > Perhaps the answer would be to stub out all external calls, but once you start > doing that, you start loosing "real" runtime information. Yes, though you can stub out the Internet and retain runtime information based on the fake Internet. > > I like this idea, but I am just trying to wrap my head around how it could > work. Hard problems remain: eg if you need to change several places in the source then the completion system needs to store info about the state of the system for each. That is closely related to the querypoint idea, which is ultimately about getting breakpoint-like snapshots at execution points determined algorithmically.
Resolving this bug as fixed since this feature is now implemented except for the cross-file component.
Marking as resolved.