Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 325482 - [hotbug_request] Critical NPE in certain jeetools projects which easy and safe servertools patch fixes
Summary: [hotbug_request] Critical NPE in certain jeetools projects which easy and saf...
Status: CLOSED INVALID
Alias: None
Product: WTP ServerTools
Classification: WebTools
Component: wst.server (show other bugs)
Version: 3.2   Edit
Hardware: PC Linux
: P2 blocker (vote)
Target Milestone: 3.3   Edit
Assignee: Elson Yuen CLA
QA Contact: Angel Vera CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-16 11:33 EDT by Rob Stryker CLA
Modified: 2010-09-20 03:56 EDT (History)
3 users (show)

See Also:


Attachments
Example project (1.10 MB, application/octetstream)
2010-09-16 14:43 EDT, Rob Stryker CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rob Stryker CLA 2010-09-16 11:33:24 EDT
Currently, jeetools module factories extend ProjectModuleFactory. ProjectModuleFactory assumes many things, specifically that each project only has one module. JEE module factories could override this, but at this stage in the maintenance stream, that'd be more dangerous.

The stacktrace is as follows:

java.lang.NullPointerException
	at org.eclipse.wst.server.core.internal.ModulePublishInfo.fillCache(ModulePublishInfo.java:283)
	at org.eclipse.wst.server.core.internal.ModulePublishInfo.getDelta(ModulePublishInfo.java:357)
	at org.eclipse.wst.server.core.internal.ServerPublishInfo.getDelta(ServerPublishInfo.java:354)
	at org.eclipse.wst.server.core.internal.Server.getPublishedResourceDelta(Server.java:1455)
	at org.eclipse.wst.server.core.model.ServerBehaviourDelegate.getPublishedResourceDelta(ServerBehaviourDelegate.java:698)

The fillCache(etc) code tries to find the newest version of this module. In this specific case, the j(2?)ee factories extending ProjectModuleFactory shows its problem and do not expose some binary modules.  The code that's causing the NPE is as follows:

	IModule m = ServerUtil.getModule(m2.getId());
	ModuleDelegate pm = (ModuleDelegate)m.loadAdapter(ModuleDelegate.class, null);

In this case, m2 is the module passed in, and the module factory is asked to find the newest version of it. The module factory returns null, and the loadAdapter(etc) call fails with an NPE, blows away the stack, and prevents the publishes from occurring.
Comment 1 Carl Anderson CLA 2010-09-16 11:49:05 EDT
Rob, is this a regression from a previous WTP version?
Comment 2 Rob Stryker CLA 2010-09-16 11:55:42 EDT
The JBoss / Redhat teams have found this to be severely blocking for some of our product's project types. 

We would very much like to see this fixed for 3.2.2, even though it is now in lockdown, we consider this critical to be pushed through. 

As was mentioned before, when the module factory is unable to correctly provide the module (due to heirarchy concerns), and the NPE arrives, the entire publish operation is blown away and the publish does not and cannot occur at all. 

We believe this is a regression from 3.1.x. In 3.1.x, the child jar may not be called a child module. but the file was published and the publish operation definitely finished properly.
Comment 3 Rob Stryker CLA 2010-09-16 12:56:11 EDT
I'm having some cvs problems tonight, but, the patch for this is as follows;

In file ModulePublishInfo.java, replace 3 instances of:

	IModule m = ServerUtil.getModule(m2.getId());
	ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null);

With:

	IModule m = ServerUtil.getModule(m2.getId());
	m = m == null ? m2 : m;
	ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null);

I believe this fix is safe. If the module factory has a new instance of the module, the new instance will be used. If the factory has misplaced its last version, the NPE disappears and the stack is not blown away.
Comment 4 David Williams CLA 2010-09-16 13:34:20 EDT
Thanks for the report, Rob. 

Off hand, this doesn't sound like a respin candidate ... but I'm just saying offhand ... I'll look forward to the server teams's assessment, in particular if its a regression. But ... it's not just a matter of us re-spining, as you know ... there's a whole release train chugging along and we are past our +2 deadline there (so will take some extra effort and justification to get all that reworked ... )

And, don't forget, it'd be best if they (we) had some reproducible steps to recreate the problem. 

Offhand, again, if this is something that effects only certain servers and only certain (non standard? or non mainstream?) types of projects, then I suspect best approach is just to quickly agree on what the right fix is for 3.2.3, and then someone can provide a patch for those with the problem. 

But, again, I'm not saying that's the right or only answer ... just presenting as one option ... I'll look forward to learning more details. 

Thanks again,
Comment 5 Angel Vera CLA 2010-09-16 14:22:59 EDT
I am looking at the history in CVS and the only changes was something related to debugging. I do not see how this could be a regression by just looking at the the history. 

Elson, can you take a look at this defect. Since this is a 3.2.2 this needs immediate attention to understand if we need to respin or not.
Comment 6 Angel Vera CLA 2010-09-16 14:27:52 EDT
One final word is that we need to consider if this bug is bad enough that it requires us to respin the driver given where we are. 

Given that the problem is in the ModulePublishInfo, I would be very nervous to make any changes in that area since this is a part of the code that is very critical and it gets called very frequently.
Comment 7 Elson Yuen CLA 2010-09-16 14:34:17 EDT
The suggested code change is safe. Having said that, I agree with Angel's
assessment that this is not a regression based on the code history. The same
logic had been there for a while.  The question is more on whether it is
something that warrant a re-spin on the candidate driver.  

Based on the defect information, the problem only shows under a specific
situation where the getModule() returns null.  This should not be the case
under a normal situation.  I agree that this defect should be fixed but this
defect does not look like to be common enough that we'll respin the driver for
it.  Given where we are at on the release cycle, I'll recommend to defer this
one and we'll fix it on 3.3 instead.
Comment 8 Rob Stryker CLA 2010-09-16 14:39:01 EDT
This is a regression in behaviour for WTP as a whole, not specifically due to any servertools changes. Specifically, more items in the jee tools projects are being exposed as child modules (as, arguably, they should be) but the parent class of these module factories still caches only one module per project. So a binary module inside a web project may not be found during a re-lookup.

In the past, my publishers would never have to worry about an NPE being thrown by the getPublishedResourceDelta(etc) method in the server behaviour delegate. Now, I do, in a properly formed web project with jars inside the lib folder. 

This bug is replicatable by adding one line of code to GenericServerBehaviour.publishModule(int, int, IModule[], IProgressMonitor) method. Specifically, making the first line of this method 

       	getPublishedResourceDelta(module);

Because of this easy replication by any server adapter calling getPublishedResourceDelta(etc), I would strongly argue that it is not isolated to specific server adapters. Calling this method from any server behaviour delegate during a publish can, in the case of binary child modules, throw an NPE and destroy the stack while providing no delta to the caller. I would consider this a regression, but perhaps I am wrong. 

An example project will be attached shortly. 

I recognize that publish delta is a critical piece of functionality, however, the current behaviour is unacceptable. In the patch / explanation I provided above, if the current behaviour finds a module, that module will be used. The only way my added 3 lines have any effect is in the case of a null result. In the case of a null result, the current behaviour is an NPE which renders the product unusable.
Comment 9 Rob Stryker CLA 2010-09-16 14:43:24 EDT
Created attachment 179059 [details]
Example project

In order to witness the effect of this bug, you must add one line to any ServerBehaviourDelegate.publishModule(etc) method:

    	getPublishedResourceDelta(module);

Then import the attached project into a workspace and attempt to deploy to any server who's behaviour delegate has this method called.
Comment 10 Angel Vera CLA 2010-09-16 14:53:17 EDT
This should be targeted to 3.2.3 but I haven't created the target yet. Will do when I get back to the office.
Comment 11 Rob Stryker CLA 2010-09-16 15:36:14 EDT
> Based on the defect information, the problem only shows under a specific
situation where the getModule() returns null. 

Elson:

This "specific situation" happens whenever a binary child module is inside a project. ANY time a binary child module is inside a project, this is replicated.  Your quote "This should not be the case under a normal situation." implies that having a binary jar file physically inside your web-inf/lib folder inside the project is not a "normal" situation, and I roundly reject this analysis. 

For our users, having jar files living physically inside a project is an extremely common scenario, and we have been arguing this position for over 2 years now.

The example project attached seems like a very very normal situation to me, and I would very much like to hear your logic as to how this is not a normal usecase.
Comment 12 Larry Isaacs CLA 2010-09-16 18:51:34 EDT
I may be missing something, but if I modify GenericServerBehaviour.publishModule() as requested on Comment #8, I don't see an NPE with the attached project.  This is likely the result of my ModulePublishInfo.fillCache() containing:

IModule m = module[module.length - 1];  (Line 280)
ModuleDelegate pm = (ModuleDelegate) m.loadAdapter(ModuleDelegate.class, null);

The "m" is set non-null in all calls.  I'm not sure why you have:

IModule m = ServerUtil.getModule(m2.getId());

I don't see this or even any "m2" field in the ModulePublishInfo.java file.  CVS says I have the most recent version, 1.29 dated 10/1/2008.  Could it be that you have local changes to ModulePublishInfo, possibly to address some other problem, that are not yet committed?
Comment 13 Rob Stryker CLA 2010-09-17 01:23:19 EDT
Interesting, thanks Larry.  I suspect my CVS issues caused me not to see the workspace change. Closing issue, although actually I should open one for jee tools regarding this but at a lower priority / severity.
Comment 14 David Williams CLA 2010-09-17 13:18:31 EDT
(In reply to comment #13)
> ... although actually I should open one for jee
> tools regarding this but at a lower priority / severity.

Are you saying there was still some sort of regression or change that broke you? If so, we would definitely like to know about it, so we (and others) can learn from our mistakes. 

But greatest thanks to everyone involved here for the focused testing and debugging. Glad it turned out not to be blocking anything after all.
Comment 15 Rob Stryker CLA 2010-09-20 03:56:33 EDT
David:

There definitely was a regression, but it only manifested itself in certain (uncommon) adopter circumstances rather than the error in the wtp codebase as I had originally assumed. 

I've created a new bug over at https://bugs.eclipse.org/bugs/show_bug.cgi?id=325716  to track this issue.