| Summary: | [hotbug_request] Critical NPE in certain jeetools projects which easy and safe servertools patch fixes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [WebTools] WTP ServerTools | Reporter: | Rob Stryker <stryker> | ||||
| Component: | wst.server | Assignee: | Elson Yuen <eyuen7> | ||||
| Status: | CLOSED INVALID | QA Contact: | Angel Vera <arvera> | ||||
| Severity: | blocker | ||||||
| Priority: | P2 | CC: | ccc, david_williams, larryisaacs | ||||
| Version: | 3.2 | ||||||
| Target Milestone: | 3.3 | ||||||
| Hardware: | PC | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Rob Stryker
Rob, is this a regression from a previous WTP version? The JBoss / Redhat teams have found this to be severely blocking for some of our product's project types. We would very much like to see this fixed for 3.2.2, even though it is now in lockdown, we consider this critical to be pushed through. As was mentioned before, when the module factory is unable to correctly provide the module (due to heirarchy concerns), and the NPE arrives, the entire publish operation is blown away and the publish does not and cannot occur at all. We believe this is a regression from 3.1.x. In 3.1.x, the child jar may not be called a child module. but the file was published and the publish operation definitely finished properly. I'm having some cvs problems tonight, but, the patch for this is as follows; In file ModulePublishInfo.java, replace 3 instances of: IModule m = ServerUtil.getModule(m2.getId()); ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null); With: IModule m = ServerUtil.getModule(m2.getId()); m = m == null ? m2 : m; ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null); I believe this fix is safe. If the module factory has a new instance of the module, the new instance will be used. If the factory has misplaced its last version, the NPE disappears and the stack is not blown away. Thanks for the report, Rob. Off hand, this doesn't sound like a respin candidate ... but I'm just saying offhand ... I'll look forward to the server teams's assessment, in particular if its a regression. But ... it's not just a matter of us re-spining, as you know ... there's a whole release train chugging along and we are past our +2 deadline there (so will take some extra effort and justification to get all that reworked ... ) And, don't forget, it'd be best if they (we) had some reproducible steps to recreate the problem. Offhand, again, if this is something that effects only certain servers and only certain (non standard? or non mainstream?) types of projects, then I suspect best approach is just to quickly agree on what the right fix is for 3.2.3, and then someone can provide a patch for those with the problem. But, again, I'm not saying that's the right or only answer ... just presenting as one option ... I'll look forward to learning more details. Thanks again, I am looking at the history in CVS and the only changes was something related to debugging. I do not see how this could be a regression by just looking at the the history. Elson, can you take a look at this defect. Since this is a 3.2.2 this needs immediate attention to understand if we need to respin or not. One final word is that we need to consider if this bug is bad enough that it requires us to respin the driver given where we are. Given that the problem is in the ModulePublishInfo, I would be very nervous to make any changes in that area since this is a part of the code that is very critical and it gets called very frequently. The suggested code change is safe. Having said that, I agree with Angel's assessment that this is not a regression based on the code history. The same logic had been there for a while. The question is more on whether it is something that warrant a re-spin on the candidate driver. Based on the defect information, the problem only shows under a specific situation where the getModule() returns null. This should not be the case under a normal situation. I agree that this defect should be fixed but this defect does not look like to be common enough that we'll respin the driver for it. Given where we are at on the release cycle, I'll recommend to defer this one and we'll fix it on 3.3 instead. This is a regression in behaviour for WTP as a whole, not specifically due to any servertools changes. Specifically, more items in the jee tools projects are being exposed as child modules (as, arguably, they should be) but the parent class of these module factories still caches only one module per project. So a binary module inside a web project may not be found during a re-lookup.
In the past, my publishers would never have to worry about an NPE being thrown by the getPublishedResourceDelta(etc) method in the server behaviour delegate. Now, I do, in a properly formed web project with jars inside the lib folder.
This bug is replicatable by adding one line of code to GenericServerBehaviour.publishModule(int, int, IModule[], IProgressMonitor) method. Specifically, making the first line of this method
getPublishedResourceDelta(module);
Because of this easy replication by any server adapter calling getPublishedResourceDelta(etc), I would strongly argue that it is not isolated to specific server adapters. Calling this method from any server behaviour delegate during a publish can, in the case of binary child modules, throw an NPE and destroy the stack while providing no delta to the caller. I would consider this a regression, but perhaps I am wrong.
An example project will be attached shortly.
I recognize that publish delta is a critical piece of functionality, however, the current behaviour is unacceptable. In the patch / explanation I provided above, if the current behaviour finds a module, that module will be used. The only way my added 3 lines have any effect is in the case of a null result. In the case of a null result, the current behaviour is an NPE which renders the product unusable.
Created attachment 179059 [details]
Example project
In order to witness the effect of this bug, you must add one line to any ServerBehaviourDelegate.publishModule(etc) method:
getPublishedResourceDelta(module);
Then import the attached project into a workspace and attempt to deploy to any server who's behaviour delegate has this method called.
This should be targeted to 3.2.3 but I haven't created the target yet. Will do when I get back to the office. > Based on the defect information, the problem only shows under a specific
situation where the getModule() returns null.
Elson:
This "specific situation" happens whenever a binary child module is inside a project. ANY time a binary child module is inside a project, this is replicated. Your quote "This should not be the case under a normal situation." implies that having a binary jar file physically inside your web-inf/lib folder inside the project is not a "normal" situation, and I roundly reject this analysis.
For our users, having jar files living physically inside a project is an extremely common scenario, and we have been arguing this position for over 2 years now.
The example project attached seems like a very very normal situation to me, and I would very much like to hear your logic as to how this is not a normal usecase.
I may be missing something, but if I modify GenericServerBehaviour.publishModule() as requested on Comment #8, I don't see an NPE with the attached project. This is likely the result of my ModulePublishInfo.fillCache() containing: IModule m = module[module.length - 1]; (Line 280) ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null); The "m" is set non-null in all calls. I'm not sure why you have: IModule m = ServerUtil.getModule(m2.getId()); I don't see this or even any "m2" field in the ModulePublishInfo.java file. CVS says I have the most recent version, 1.29 dated 10/1/2008. Could it be that you have local changes to ModulePublishInfo, possibly to address some other problem, that are not yet committed? Interesting, thanks Larry. I suspect my CVS issues caused me not to see the workspace change. Closing issue, although actually I should open one for jee tools regarding this but at a lower priority / severity. (In reply to comment #13) > ... although actually I should open one for jee > tools regarding this but at a lower priority / severity. Are you saying there was still some sort of regression or change that broke you? If so, we would definitely like to know about it, so we (and others) can learn from our mistakes. But greatest thanks to everyone involved here for the focused testing and debugging. Glad it turned out not to be blocking anything after all. David: There definitely was a regression, but it only manifested itself in certain (uncommon) adopter circumstances rather than the error in the wtp codebase as I had originally assumed. I've created a new bug over at https://bugs.eclipse.org/bugs/show_bug.cgi?id=325716 to track this issue. |