|
Description
Larry Isaacs
(In reply to comment #0) > 1. What OS, Java, and version of WTP you are using. Various combinations: - Windows 7 64-bit, Java 1.6.0_26 64-bit, WTP 3.3.1 - Windows XP 32-bit, Java 1.6.x, WTP 3.3.1 - Linux Ubuntu 32-bit, Java 1.6.x, WTP 3.3.1 In the past the same problems were present with WTP 3.2.x on Windows systems in particular (no tests on Linux). So, it's not a 3.3.x regression. > 2. In simple terms what is the structure of the project(s) you are working on > that experience the publishing problem (e.g. one main dynamic web project which > depends on two utility projects). Also feel free to add additional detail if > is complements the answer to #3 below. These kind of problems appeared at least with two project setups. - single-project setup: a huge Dynamic Web Project, with no dependencies; at least 7 'WebContent' folders deployed to the web app (configured through the deployment assembly); external dependencies (JAR files) put in the WEB-INF/lib of one of the 7 'WebContent' folders, being WEB-INF/lib a linked folder (the actual JAR are either on an external folder or in another project in the workspace) - multiple project setup: a Dynamic Web Project with almost no classes (only a src folder with configuration) which depends on 23 other projects with the Utility Module facet; some of these projects also have additional natures (Spring, Groovy); all of them (including the DWP) have the Gradle nature; the webapp resources are deployed from 11 different 'WebContent' folders, 7 of which are linked folders (the contents residing on 7 of the other projects); external dependencies (JAR files) are deployed to WEB-INF/lib contributed by a library ("Gradle Dependencies") which is mapped in the Deployment Assembly > 3. In simple terms what do you recall as the work flow that occurred leading up > to the faulty publishing you experienced. After you do some changes, you try to publish them (either automatically, thanks to the "automatically publish when resources change" setting on the Tomcat server, or manually by hitting "publish" or by restarting the server). No error is given but the publishing silently fails: if you go to the deploy folder you find that: - most of the contents is not published OR - some of the contents is not published The error is subtle because it seems like some resources that were not changed are deleted from the publish folder without apparent reason and are not copied back. In fact, the feeling is that "something is disappearing" rather than "something doesn't get copied/updated", but I cannot say for sure because I do not have exact steps to reproduce. When the problem happens, the server however thinks that it's synchronized. So, trying to "publish" again does not help. To solve the inconsistent state, you have to: - clean the server OR sometimes even: - remove the DWP from the server, publish, re-add the DWP to the server, publish again The problem seems to appear more frequently when "automatically publish when resources change" is enabled and the server is running (so that the publishing would trigger a context reload on the server). No error is present in the error log. Sometimes "publishing errors" appear saying that some files could not be deleted (JAR files, for instance), because they may be locked by other processes. However, I do not have any clue that these "can't delete files" problems are actually related to the main problem described here, because: 1) often the problem happens without any such "can't delete files" error 2) trying on a simple test project, I could reproduce the "can't delete files" errors, but they never led to an inconsistent server publishing state > 4. What are you aware of that the publishing did wrong, or failed to do, when > you experienced faulty publishing. ClassNotFoundExceptions at runtime or 404 errors for JSPs or static web contents. Created attachment 208636 [details]
tomcat.core jar for WTP 3.3.1 with extra delta publish tracing
Sorry to take so long, but I had trouble finding the cycles with some December deadlines.
This jar uses "tracing", which means the output goes to System.out. This approach will make it easy to enable other existing tracing if it appears it would be useful. To enable the tracing, you will need to do the following:
1. Create an ".options" file in the directory with the eclipse executable resides containing the following line:
org.eclipse.jst.server.tomcat.core/debug=true
2. Add "-debug" to the eclipse.ini or include it in the startup command.
3. Start eclipse from a console window and optionally redirect stdout to a file.
New tracing output starts with "###". There will be some other output that doesn't start with "###". This jar primarily dumps delta publishing info for the content portion of the webapp. You will see a line like:
### Tomcat Module Delta Publish #<number>, Delta length: <number>, to <path>
This will be followed with indented lines of the resources that the publishing framework thinks needs to be published. The main question to be answered by the next publish failure is was the problem resource(s) included in the delta publishing info. This will help determine if the problem is in the delta calculation or in the actual publishing.
This tracing doesn't yet cover building jars from dependent utility projects. It wasn't clear if I need to cover that too. Let me know if I do, or have any questions.
Thanks,
Larry
Hi Larry! Thank you very much, I'll try it as soon as possible and install on my co-workers machines too, in order to speed up the reproduction of the problem. A couple of questions: - I simply need to add the attached JAR to the dropins folder to let it work, don't I? - is there any way to check whether Eclipse is actually using it (apart from seeing the tracing output, of course)? I mean, is the modified JAR marked with a special plugin version I can check in the Installation Details Eclipse dialog (or anywhere else)? Regarding the creation of the required utility projects... well, I can't bet on it, but I would say that it also happened some utility project JARs were not deployed sometimes. We may do this way: for now, don't do anything else. I will install the modified JAR into my Eclipse instance and to my co-workers' ones. If we reproduce the problem AND we see that there are missing utility project JARs, I will let you know. Thankds again by now. Sorry for not explaining how to "install" the jar. You can save a copy of the jar with the exact same name currently in the "plugins" folder, then drop this new one on top of it, replacing the exiting jar. That's the simplest way to ensure the new jar gets used without going through "update". If the jar name doesn't match, you aren't using WTP 3.3.1. If you upgraded from WTP 3.3.0 and there are older "...server.tomcat.core" jars present, they won't matter since they have older version numbers than the one you are replacing. Hi Larry, I spread your modified JAR like a virus in my development team :-D, together with a little questionnaire to fill in when anyone gets to reproduce the problem. First of all, a co-worker of mine found out that he had to put the .options file in his home directory rather than in the Eclipse directory. I don't know why, in my environment (Windows) it worked by putting it in the Eclipse directory, while he is using Linux. Maybe that file has just to be put in the Eclipse working dir, so it depends on where you are when you launch the eclipse executable. Anyway, this is not important. What's more interesting is that today the same co-worker saw the problem. Here is the situation: - just re-installed Eclipse from scratch because of other problems - opened Eclipse on the same workspace he was using until this morning; this workspace contains our projects (see the "multiple project setup" description in my comment #2); also, this workspace already contained data from a previous publish of our main DWP project in the default WTP publish directory (WORKSPACE_LOC/.metadata/.plugins/.org.eclipse.wst.server.core/tmp0/wtpwebapps) - started Eclipse - refreshed the Gradle Dependencies on all the projects: what's this? It's simply a feature of Spring Tools Suite that parses a build.gradle file within an Eclipse project; this project contains definitions like: "this project depends on this JAR"; the JAR is then downloaded from a Maven repository and added to a library called "Gradle Dependencies"; this library is on the project build path and is also listed in the Deployment Assembly of our main Dynamic Web Project (the one that is published), so that those JARs are published together with the webapp to WEB-INF/lib folder - changed some Groovy source files and created some others in one of the utility project the main DWP depends on - hit "Publish" on the server - hit "Start" on the server Tomcat started, but when it tried to deploy our webapp it complained about a missing class: java.lang.ClassNotFoundException: org.springframework.web.util.Log4jConfigListener This class is actually contained in one of the JARs in the Gradle Dependencies library of our main DWP project, particularly: org.springframework.web-3.0.5.RELEASE.jar. What's really interesting is the output produced by your modified WTP plugin (see the attached file). In this trace you can see that: - the files in WEB-INF/classes/conf are marked as changed; they are coming from the only source folder of our main DWP project, which is <PROJECT_LOC>/etc/conf source folder and were NOT modified actually, although the trace show them as changed; please note that the listed files in the trace are all the files in <PROJECT_LOC>/etc/ (in other words: there are not any other file there that wasn't seen as modified by WTP) - the JAR we need (org.springframework.web-3.0.5.RELEASE.jar) was REMOVED from the publish folder!!! <= HERE IS THE MAIN PROBLEM - many other JARs were removed just like that, although those JARs were still part of the Gradle Dependencies library and hence should have not been touched (or, at least, should have been published again)! - however, not all JARs in the Gradle Dependencies library were removed from the publish folder: in other words, some other JARs of the same library were correctly kept there! Other information that may be useful: - my co-worker is using SpringSource Tools Suite 2.8.1 Eclipse distribution, 32-bit version, on a Linux Debian Squeeze 32-bit system - while changing the Groovy files in one of the utility projects, the Tomcat server was not running - "Build automatically" was always on, but "Never publish automatically" was checked; so changes were published only when my co-worker hit "Publish" on the server - to fix the broken situation, a "clean" on the server was sufficient I may provide you the complete log, if you wish, but I checked it and it seems like there is no previous error that could be bound to this problem. Created attachment 208777 [details]
This is an extract from my co-worker log
I was going through the trace once again and I noticed: WEB-INF/lib/http-builder-0.5.1.jar, Kind: Added This made my co-worker recall that, besides changing some Groovy source files, he also added a new dependency for the same utility project to that new JAR. So, this information (the addition of http-builder-0.5.1.jar in WEB-INF/lib) seems to be the only one correct. All the removals you see in the trace, instead, are wrong, as i said in my previous comment. (In reply to comment #7) > This made my co-worker recall that, besides changing some Groovy source files, > he also added a new dependency for the same utility project to that new JAR. Hmmm... this may need a further explanation. In our setup, when I add a dependency to an external JAR for a utility project P1, that dependency is also "pushed" to our main DWP. So, in our case, all you need to know is this: a new JAR was added to the Gradle Dependencies library of the main DWP (the one that is published), so it's correct to see such a delta publish. However, no JARs were removed from that library, so it's not correct so see those removals in the delta publish. And of course... I know my looong and annoying messages above are not the best way to wish you a merry Christmas... :-D However, please Larry accept my sincere season's greetings! Hi Mauro, hope you are having/had a happy holiday season as well. I'm on vacation this week, so I'll see what I can get figured out before the distractions resume next week. The only reason I can think of that would cause the jars to be listed for removal is that the Tomcat plug-in is aware they were previously published with the DWP no longer claiming them as included. Since a Clean fixes it, that would suggest that what the wst.server.core plug-in believes the DWP contains has become stale or corrupted somehow. I have noted a number of places where the "wst.server.core" swallows exceptions, some related to publishing. For now, it might be best to to add the following line to the .options file: org.eclipse.wst.server.core/warning=true The output would contain "/warning" for this Trace output, which also only goes to System.out. If an exception is involved, a stacktrace will be included. I'll see if I can figure out something useful to trace related to this. However, if one of these "/warnings" shows up, it will likely pinpoint more clearly where this logging needs to occur. Thanks, Larry When a change occurs in Eclipse, "builders" run, "jobs" get created, "worker threads" get spawned. These in turn beget more "jobs" spawning more threads, etc. In theory, when this process "settles out", hopefully the workspace has achieved the desired state. However, the exact sequence in which all this occurs is about as deterministic as a pig race at the state fair. :) We may have a problem with publishing occurring before everything has "settled out". It may be able to manifest in different ways. For this one incident, the only way I can see it occurring is if the "Gradle Dependencies" classpath container only reported the new jar as its contents during the "publish", then reported the full set of jars during the "clean", after things has "settled out". The server publishing depends on other plug-ins (not sure exactly which) to resolve what resources the project's module contains. The best I can tell, the call to get this list of resources is the same for "publish" and "clean", so what the "Gradle Dependencies" reported had to have changed between the "publish" and the "clean". For anyone encountering a failure, it would be desirable to try a second "publish" just to see what might show up in the log output prior to resorting to a "clean". It has been a while since I last examined the "workspace locks" that the server publishing uses. I would not be surprised if it's just not practical to acquire "locks" on workspace resources sufficient to ensure that the workspace has "settled out" before publishing without causing performance issues or potential deadlocks. I'll see if I can gather some info about what Spring Tool Suite and the Gradle support might contributing to the "settling out" process. I fear I may get to learn more about Eclipse than I ever wanted to know. Hi Larry, happy new year! I hope your holidays went well. (In reply to comment #11) > We may have a problem with publishing occurring before everything has "settled > out". It may be able to manifest in different ways. For this one incident, > the only way I can see it occurring is if the "Gradle Dependencies" classpath > container only reported the new jar as its contents during the "publish", then > reported the full set of jars during the "clean", after things has "settled > out". The server publishing depends on other plug-ins (not sure exactly which) > to resolve what resources the project's module contains. The best I can tell, > the call to get this list of resources is the same for "publish" and "clean", > so what the "Gradle Dependencies" reported had to have changed between the > "publish" and the "clean". There are some strange things, however: 1) first of all, as I wrote in my comment #1, in other circumstances this problem happened even when we didn't use the STS Gradle plugin at all. So, there must be some common root cause 2) secondly, in my co-worker's latest case, things had not changed in the Gradle Dependencies library container from one publish operation to the other; he simply restarted Eclipse I know the STS Gradle plugin persists the contents of the Gradle Dependencies library at some time, in order to avoid recalculation between different Eclipse sessions... I don't know if this may have interfered in some way with the publish operation in that use case... > For anyone encountering a failure, it would be desirable to try a second > "publish" just to see what might show up in the log output prior to resorting > to a "clean". Well, I can state for sure that such a second publish is totally useless to solve the corrupted state. But if it may be useful to see what is written in the log, I will instruct my co-workers to do that. > It has been a while since I last examined the "workspace locks" that the server > publishing uses. I would not be surprised if it's just not practical to > acquire "locks" on workspace resources sufficient to ensure that the workspace > has "settled out" before publishing without causing performance issues or > potential deadlocks. However, it sounds reasonable that no modifications should be performed on a project while its being published... doesn't it? > I'll see if I can gather some info about what Spring Tool Suite and the Gradle > support might contributing to the "settling out" process. I fear I may get to > learn more about Eclipse than I ever wanted to know. Spring Tools Suite Gradle support is mainly maintained by Kris De Volder, a very kind guy at SpringSource. If you leave a message in the STS support forums at http://forum.springsource.org/forumdisplay.php?32-SpringSource-Tool-Suite I'm sure he will help you if you need further information. From comment 7, it sounds like one of the changes was the addition of the http-builder-0.5.1.jar. It's my guess that this is what triggered the "pig race" and the resulting publishing behavior for this particular failure. There could be other forms of failure that are also possible. It's possible that the common cause is vulnerabilities to the "pig race". The presence of a classpath container/library could be one way a vulnerability could be introduced. WTP doesn't have one that's dynamic like what Gradle and Maven use, so there is no WTP testing that would cover this use case. Once upon a time, server publishing locked the the whole workspace. However, that meant you effectively couldn't use Eclipse until the publish finished and released the lock. For long publishes, this way deemed as having too big an impact on performance. I'm not sure exactly what happens now. I suspect for the duration of the publish, projects couldn't change. But that doesn't mean that we aren't in the middle of the "pig race" and that additional "settling out" changes aren't pending. With manual publishes, you would miss out on the additional changes unless you published again. I don't know of a reliable way to know when things have "settled out". On occasion I've watched the "Heap Status" indicator to see if it's bouncing as an indication of on going "settling". In theory, if automatic publishing was enabled, you could get multiple publishes during the "settling out". However, I'm not sure the publishing detection is any less vulnerable to the "pig race". In this one failure, I suspect that a second publish would have worked as well as the clean. It will be interesting to see what was published incorrectly in a case with the second publish continues to fail. (In reply to comment #13) > It's possible that the common cause is vulnerabilities to the "pig race". The > presence of a classpath container/library could be one way a vulnerability > could be introduced. WTP doesn't have one that's dynamic like what Gradle and > Maven use, so there is no WTP testing that would cover this use case. But isnt' the WTP "Web App Libraries" library container also dynamic? I mean, if you add a JAR to the WebContent/WEB-INF/lib, it will appear in the "Web App Libraries" library and so added to the classpath and even published... The path is different, but the purpose is similar... > Once upon a time, server publishing locked the the whole workspace. However, > that meant you effectively couldn't use Eclipse until the publish finished and > released the lock. For long publishes, this way deemed as having too big an > impact on performance. I'm not sure exactly what happens now. I suspect for > the duration of the publish, projects couldn't change. But that doesn't mean If so, I would say that locking the single project should be enough to avoid losing changes that may occur in the middle of a publish operation... > In this one failure, I suspect that a second publish would have worked as well > as the clean. It will be interesting to see what was published incorrectly in > a case with the second publish continues to fail. The main problem is that when this problem happens, the server is still seen as "synchronized" by WTP. That is, a second publish (even if requested manually) will be a no-op! If the server were in the "Republish" state, it wouldn't (almost) be a problem at all. However, the server state is actually corrupted :-( > But isnt' the WTP "Web App Libraries" library container also dynamic? I mean, > if you add a JAR to the WebContent/WEB-INF/lib, it will appear in the "Web App > Libraries" library and so added to the classpath and even published... > The path is different, but the purpose is similar... The "Web App Libraries" doesn't have to "refresh" or "resovlve" like ones that are more dynamic, i.e. driven by some sort of dynamic configuration. I know because the Eclipse work I do where I work involves such a "dynamic" classpath container. See Bug 270287 for some details about some "fun" I had in this area. > If so, I would say that locking the single project should be enough to avoid > losing changes that may occur in the middle of a publish operation... I don't think they are lost. It's just that they are going to occur due to a job that runs after the publish, possibly spawned from a job that ran before the publish. Thus, the changes aren't present until after the publish. > > The main problem is that when this problem happens, the server is still seen as > "synchronized" by WTP. That is, a second publish (even if requested manually) > will be a no-op! If the server were in the "Republish" state, it wouldn't > (almost) be a problem at all. However, the server state is actually corrupted > :-( Regrettably, I haven't had the time to track all of the changes in the underlying server plug-ins. I thought at one time that if the server was marked "synchronized" that manually publishing wouldn't even spawn the "publish" job. That's not true with WTP 3.3.x. You should always see something in the log output for every publish. I'm hoping to see if the "delta" for the second publish is in fact empty. And if so, what exactly was wrong in the first publish. If jars were removed in the first publish and didn't reappear in the second publish, then the classpath container would have to be stuck in an invalid state. This should be visible in the Project Exporer or Package Explorer. It may turn out that for a publish that continuously fails, there is more that goes wrong than in the case of the one failure so far. There could be caching going on in plug-ins below the server plug-ins that that I'm not aware of that might somehow contribute to "permanent" publish failures. (In reply to comment #15) > The "Web App Libraries" doesn't have to "refresh" or "resovlve" like ones that > are more dynamic, i.e. driven by some sort of dynamic configuration. I know > because the Eclipse work I do where I work involves such a "dynamic" classpath > container. See Bug 270287 for some details about some "fun" I had in this > area. Yes, but please consider that the "Gradle Dependencies" library is not updated automatically when a resouece changes. I have to issue a manual "Refresh Dependencies" to make it happen. When you issue that command, a Gradle process is started (you can think of it like an Ant process), this parses a build.gradle file to gather dependencies information, pass it to STS, which then updates the "Gradle Dependnecies" library accordingly. So it is dynamic, but the user decides when to update its contents. > I don't think they are lost. It's just that they are going to occur due to a > job that runs after the publish, possibly spawned from a job that ran before > the publish. Thus, the changes aren't present until after the publish. I see. However my experiments lead to think they are lost, because when the problem happens, I find no way to fix the situation apart from cleaning the server or, sometimes, even remove and re-add the DWP from the server. Repeatedly trying the publish operation does not help. However, the next time the problem occurs, I will certainly try a second publish to gather traces in the log file. Hi Larry, I'm not disappeared. The fact is that lately we've been very busy and couldn't dedicate ourselves to the tracing of this problem. Anyway, today a co-worker of mine experienced another problem, that may (or may not) be related to this one. This is what happened: - he started Eclipse - soon enough, he started his server (Tomcat); please note that the server contents should have been already there, published on last Friday - however, shortly after that, a workspace build started - this must have caused some files of (at least) one of the utility projects to be cleaned (in order to recompile) and removed from the JARs that WTP deploys to WEB-INF/lib - this caused a messed: my feeling is that some of these JARs with partial contents were deployed before Tomcat finished its startup process (so NoClassDefFoundErrors were printed in the Tomcat console) some other were deployed after Tomcat finished its startup process (so a context reload was issued and this failed as well - this is another long story and it's mainly Tomcat's fault) So, once again we experienced that a timing issue is surely present: IMHO one of the following two situations should have happened: - either Tomcat startup should have been blocked by the workspace build, since it was in progress - or, if the Run command was issued before the workspace build started, the "automatically publish" feature should have postponed the publishing of the new utility module JAR contents to the end of the build process However I feel this is not the original issue we were studying here. It might be related (the famouse pig race...) but not necessarily. Please note that when the workspace build finished, restarting the server caused it to start correctly, because the publishing process was performed successfully. Created attachment 210100 [details]
My co-worker log of 2012-01-26
Hi Larry,
today the original problem happened again. It seems like it happens quite frequently when you try to start the server soon after Eclipse starts.
This is what my co-worker (always the same guy, working on Linux) did:
- start Eclipse
- change a Groovy file
- start Tomcat from Eclipse
- oops, the port is in use (another instance of Tomcat, launched from outside Eclipse was using that port)
- stop the other instance of Tomcat
- start again Tomcat from Eclipse
Result: many missing JARs in the publish directory. As you can see from the log I attached, once again the delta publish thinks that some JARs have been deleted, while they are not!!! Well, actually the files in WEB-INF/classes/conf were not changed either, however that is not a big trouble.
Please ignore the leading errors produced by the Groovy plugin, they are related to another problem (we missed a dependency towards jsr305 that was apparently needed by another JAR).
Please note that the .options file now contains:
org.eclipse.jst.server.tomcat.core/debug=true
org.eclipse.wst.server.core/warning=true
as you suggested.
As you requested, after the problem occured, my co-worker did the following: - stop the Tomcat instance (the web application of course didn't start, because of missing classes) - hit "Publish" to see if the situation could be fixed by WTP This is the output produced after pushing "Publish": org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.843 ### Tomcat Module Delta Publish #2, Delta length: 0, to /home/davide/devel/sts/SuiteCardinis_sempla_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/SuiteCardinis org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.874 Creating runtime directory at /home/davide/devel/sts/SuiteCardinis_sempla_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0 org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.874 Creating deployment directory at /home/davide/devel/sts/SuiteCardinis_sempla_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.875 Server cleaned org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.875 Backup and publish org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.875 Publishing catalina.policy org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.876 Publishing catalina.properties org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.876 Publishing context.xml org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.876 Publishing server.xml org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.876 Publishing tomcat-users.xml org.eclipse.jst.server.tomcat.core FINEST 26/01/12 08:31.33.876 Publishing web.xml org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.877 Localizing configuration at /home/davide/devel/sts/SuiteCardinis_sempla_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0 org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.880 Context docBase settings updated in server.xml. org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.881 Apply context configurations org.eclipse.jst.server.tomcat.core FINER 26/01/12 08:31.33.882 Server.xml updated with context.xml configurations As you can see, the delta publish seemed to update just the server configuration. No changes regarding our project were detected. So, the missing JARs were not copied to the publish directory. To fix the problem, my co-worker had again to clean the server. Thanks for the information. I think the next step is to log some info that confirms at the point in the "publishing" code where it asks for the project's content that the jars are not included in the the list. It would confirm if the problem is in underlying plug-ins or a bug in the delta calculation the Tomcat plug-ins are doing. I'll see about attaching a plug-in that will dump the list of jars appear in that list of project contents. Just experienced this today, in situation somewhat like Mauro's first project: a single project, with numerous dependency jar files in WEB-INF/lib. Only the first four (in alphabetical order) were retained in the published folder, the rest were removed. Stopping the server, unpublishing the app, manually cleaning out the folder, and republishing did not fix it first time: I had to restart Eclipse before it would work. As an aside, I've experienced very similar behaviour from the Jetty WTP plugin. You may want to look into what may be in common between the two of them... Created attachment 210418 [details]
wst.server.core jar for WTP 3.3.1 with extra logging
Make a backup copy of org.eclipse.wst.server.core_1.1.303.v20110816_1717.jar in the "plugins" folder and drop this in as its replacement. Also, add the following two lines to the ".options" file:
org.eclipse.wst.server.core/debug=true
org.eclipse.wst.server.core/webinflib=true
Whenever a publish occurs, output like the following should appear in the console output prior to the tomcat.core output (sorry about the line wrap):
org.eclipse.wst.server.core /webinflib 01/02/12 21:32.26.477 ### Computing currentResources for module: TestWebApp using class org.eclipse.jst.jee.internal.deployables.JEEFlexProjDeployable
org.eclipse.wst.server.core /webinflib 01/02/12 21:32.26.477 ### WEB-INF/lib: <some>.jar
org.eclipse.wst.server.core /webinflib 01/02/12 21:32.26.477 ### WEB-INF/lib: <some other>.jar
There will be a "### WEB-INF/lib: <some>.jar" line for each jar that, according to the Java EE plugins, should be published to the WEB-INF/lib of the webapp. It is this list of jars that is used to compute the delta so the webapp can be updated to match this list.
Following the tomcat.core output, there should appear a line like the following:
org.eclipse.wst.server.core /webinflib 01/02/12 21:32.26.553 ### Clearing ModulePublishInfo cache.
This indicates that the prior list of jars has been discarded. The list should be rebuilt for each publish.
Since the problems have appeared with starting Tomcat soon after starting Eclipse, an experiment would be to try a sequence of publishes starting soon after Eclipse startup. Check the lists of jars in the output to see if it changes over time and see if it possibly gets stuck in a state where jars are missing and won't reappear with the next publish.
Hi Jules,
Thanks for the input. You are welcome to try these modified jars, assuming you are using WTP 3.3.1. My guess is that the problem is in the Java EE plug-ins where the project gets stuck in a state where some of the jars that should be included go missing. If this is the case, then not just Tomcat would be affected, which is consistent with your mention of seeing the issue with Jetty.
It would be helpful to know how your jars were being "included" in the project. Were they physically in the project's "WebContent/WEB-INF/lib", included via a library, or some other means.
Thanks,
Larry
The jars were placed in WebContent/WEB-INF/lib, then right-clicked and 'add to build path' selected. I am on 3.3.1, so will give your replacement jars a try. Thanks for the info Jules. FYI, you don't need to do the "add to build path" part. Putting the jar in the "WebContent/WEB-INF/lib" should automatically include it in the "Web App Libraries" container that is already in the build path. In theory, it shouldn't hurt to have the jar twice in the build path, but given this mysterious publishing behavior, who knows. Thank you very much Larry for your modified wst.server.core plugin. I'm going to spread it to my co-workers like I did for the modified jst.server.tomcat.core plugin. Just a question: in the .options file, does the line: org.eclipse.wst.server.core/debug=true supersede this: org.eclipse.wst.server.core/warning=true or should I leave them both? Sorry if the question is dumb, I'm not used to this kind of syntax to specify log verbosity. You will need both. It appears up to the plug-in to define if and how their ".options" settings work. So it's hard to know without examining source to see how it is implemented. Hi Larry, here I am with a slightly different scenario than the one described so far. But the result is the same: server state inconsistent. This is what happened yesterday and today to a co-worker of mine. YESTERDAY: - server running in normal mode without problems - need to restart in debug mode; to do that, "Start the server in debug mode" button pressed - Tomcat starts the stopping process but because of a webapp code problem, it can't finish (stays hang while trying to stop) - Tomcat killed in order to force its stop - hit "Start the server in debug mode" again to retry - Tomcat starts too much quickly... - looking at the publish directory and trying to connect to the webapp with the browser reveals that the webapp is not deployed and that the publish folder just contains the (empty) WEB-INF/lib directory In other words: ALL the files of the webapp were removed from the publish folder and not re-deployed. Hitting again "publish" did not help. My co-worker had to clean the server to restore the situation. TODAY: - server running in debug mode without problems - need to restart, to do that "Start the server in debug mode" button pressed - Tomcat starts the stopping process and completes it normally; then it starts the startup process again - the startup process is too quick... - once again, all the webapp files were removed, only the empty WEB-INF/lib directory maintained. I'm going to attach the logs of the two cases. In the first log, go to line with timestamp 07/02/12 17:51.50.943: you can see that Tomcat stop has been requested. Then, some /webinflib entries are shown, with the webapp full contents (external JARs, utility projects). Then a Delta Publish with length 0, followed by a list of "Clearing ModulePublishInfo cache" entries. Then a "ping success" that seems to be performed before Tomcat stop has completed. Then a new Delta Publish, length 24, followed by the list of ALL the webapp files marked as "removed"!!! After this a /warning by wst.server.core saying "Error during publishing", then a list of "/webinflib" with just the utility projects (no external JARs here) and again a bunch of "Clearing ModulePublishInfo cache" entries. Then, again, a full log message entry: !ENTRY org.eclipse.jst.server.tomcat.core 4 0 2012-02-07 17:52:15.154 !MESSAGE Publishing failed with multiple errors !SUBENTRY 1 org.eclipse.wst.server.core 4 0 2012-02-07 17:52:15.154 !MESSAGE Could not delete /home/davide/devel/sts/SuiteCardinis_trunk_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/SuiteCardinis/WEB-INF/lib. May be locked by another process. !SUBENTRY 1 org.eclipse.wst.server.core 4 0 2012-02-07 17:52:15.155 !MESSAGE Could not delete /home/davide/devel/sts/SuiteCardinis_trunk_ws/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/SuiteCardinis/WEB-INF. May be locked by another process. Please note that: - this error wasn't shown as a dialog box to the user - no other process could lock those folders, except for the Tomcat process itself; did WTP try to remove the published files before the Tomcat process was completely stopped? - and, in any case, why WTP decided to remove all the published files!? What follows (starting from 17:58.54.879) I think refers to the full server cleaning performed by my co-worker after the disaster. The second log is similar and should be read starting from 08/02/12 08:59.18.482. I will let you know if the original problem (i.e.: only some JARs missing) happens again. Meanwhile let me know if you need more info regarding this specific problem. Created attachment 210717 [details]
Log of 07-02-2012, see from 07/02/12 17:51.50.943
Created attachment 210718 [details]
Log of 08-02-2012, see from 08/02/12 17:58.54.879
The presence of: ... ### Computing currentResources for module: SuiteCardinis using class ... ... ### Tomcat Module Delta Publish #6, Delta length: 24,... with no interventing: ... ### WEB-INF/lib: ... indicates the Tomcat publishing is doing what it should, but the J2EEFlexProjDeployable is reporting incorrect information about what should be published. In this occurrance, it appears that it reported that there was nothing in the SuiteCardinis project. I'll look for or add more logging where needed to see if we can determine why J2EEFlexProjDeployable is "losing its mind"?! I would be curious, the next time this happens, to see if closing and re-opening the project could straighten out the J2EEFlexProjDeployable. Or perhaps adding a dummy jar dependency and then removing it. Or maybe even adding a JSP file and then deleting it. (In reply to comment #30) > I would be curious, the next time this happens, to see if closing and > re-opening the project could straighten out the J2EEFlexProjDeployable. Or > perhaps adding a dummy jar dependency and then removing it. Or maybe even > adding a JSP file and then deleting it. Next time we'll do this test too. Created attachment 211165 [details]
My own output with my comments as "§§§§§"
Hi Larry, today I reproduced the problem myself and I tried to do what you suggested.
No one of the actions you suggested did fix the structure of the published project:
- add/remove a JSP
- add a JAR to the build path of one of the utility projects the DWP depends on, but not on its deployment assembly (so a JAR which SHOULD NOT be published)
- add a JAR to the deployment assembly of the DWP (so a JAR which SHOULD be published)
- close and reopen the DWP project (but not the utility projects it depends on)
This is what I did:
- start Eclipse
- hit "start" on the server: the web application should have been published by a previous Eclipse session
- webapp startup failed because of some missing JARs in the published folders
- looked at the log and saw that some JARs were marked as removed (take postgresql-9.0-801.jdbc4.jar as an example)
- did the tests you asked for
In the attached log you can find my comments preceded by "§§§§§": they described what I've done immediately before the following log entries were produced.
Anyway, an interesting thing can be noticed: after I closed and reopened my DWP project, the server status was marked as "dirty", to suggest a republish was needed. Ok, this is reasonable. However, after I hit "publish" I some interesting info was produced in the log entries: once again, the JARs that were marked as "removed" in the first server start were again marked as "removed"!!
More precisely, you see:
org.eclipse.jst.server.tomcat.core FINEST 17/02/12 10:22.44.894 WEB-INF/lib/jnlp-6.0.10.jar, Kind: Removed
This is correct: this is the JAR I previously added to the DWP project deployment assembly in the previous test. I then removed it before closing and reopening the DWP project.
Then, you see again:
org.eclipse.jst.server.tomcat.core FINEST 17/02/12 10:22.44.894 WEB-INF/lib/h2-1.2.139.jar, Kind: Removed
org.eclipse.jst.server.tomcat.core FINEST 17/02/12 10:22.44.894 WEB-INF/lib/itext-2.1.7.jar, Kind: Removed
org.eclipse.jst.server.tomcat.core FINEST 17/02/12 10:22.44.894 WEB-INF/lib/miglayout-3.6.jar, Kind: Removed
org.eclipse.jst.server.tomcat.core FINEST 17/02/12 10:22.44.895 WEB-INF/lib/xstream-1.3.jar, Kind: Removed
[and so on, including the postgresql-9.0-801.jdbc4.jar we're considering as an example]
That is: the Tomcat core is told those JARs have been removed... but, from the J2EEFlexProjDeployable point of view, they have been actually already "missing" for a long time from the published structure!
It seems like a "saved" or "cached" information is used after project reopening and this "saved" or "cached" information is actually wrong. The same seems to happen often (but not always!) when Eclipse first starts.
As usual, I had to clean the server to get all the JARs published back correctly (including our postgresql-9.0-801.jdbc4.jar).
I hope this adds some bit of useful information to the investigation.
Created attachment 211464 [details]
wst.server.core jar #2 for WTP 3.3.1 with extra logging
Thanks for this latest report Mauro. I have been studying the code the past few days to see what I can make sense of. It is interesting that adding the JSP caused the missing jars to return. However, for some reason publishing apparently lost track of what was in the prior "incomplete" publish and computed an empty delta so the published webapp didn't get fixed. So, the JSP trick half worked. The add/remove jar and close/reopen project didn't appear to work at all. Now that I think about it, they are likely to spawn the "pig race" that is the assumed source of the undesired behavior, so this doesn't seem too surprising.
Attached is an updated wst.server.core jar to replace the prior one. It has a little more logging to try to see why the JSP trick didn't work. Next time, please try adding the JSP, then publish, remove the JSP, then publish again. The Tomcat plug-ins can't do much about resources going "missing", but it should be able to publish correctly once they come back.
(In reply to comment #33) > Created attachment 211464 [details] > wst.server.core jar #2 for WTP 3.3.1 with extra logging > > Thanks for this latest report Mauro. I have been studying the code the past > few days to see what I can make sense of. It is interesting that adding the > JSP caused the missing jars to return. However, for some reason publishing You're right, I didn't notice that the "WEB-INF/lib" log entries after the JSP addition+removal were reporting the missing JARs!! I will try your latest suggestions. Meanwhile, please consider the following. The investigations conducted so far were made using an SSD, so a very fast disk. Yesterday I was working with a traditional HDD and I observed many failures in publishing, also during the same Eclipse working session (while the main cases observed so far were related to the first publishing after startup). So, it seems like using a slow disk can easier lead to corruption. Maybe this is not too much of a surprise. Unfortunately, I coulnd't collect logs yesterday because I was in a hurry and overwhelmed with things to do. Created attachment 211519 [details]
My output of 23 February 2012
Reproduced again by myself, here is the new log.
Once again, I used "§§§§§" to comment what I did.
In short:
- started Eclipse
- made some changes
- hit "start on the server"
- the publish started, but JARs were missing in the publish folder, so the webapp failed to start
- stopped the server
- added a JSP and hit publish
- removed that JSP and hit publish
The addition and the removal of the JSP in the workspace simply caused the addition and the removal of the same JSP in the publish folder.
The missing JARs were always missing, although the produced log seems to suggest that they were "known" to the J2EEFlexProjDeployable, although they were not at the first publish attempt.
Another info that is not related to my last attempt. While working with the slow HDD I noticed that hitting "clean" on the server often leads to "cannot delete folder bla bla, it may be locked by another process". However, this time I'm 100% sure that: - Tomcat was not running while the error happened, so it couldn't be Tomcat that locked the files/folders - no process other than Eclipse itself could be locking them, because I had no external process running on those folders (no anti-virus or such) So, it must be Eclipse itself that is holding locks on those files... maybe there's one thread that is keeping open File references on them and another that is trying to delete the folders (because of my "clean" request) at the same time? Created attachment 211650 [details]
wst.server.core jar #3 for WTP 3.3.1 with extra logging
With the execution paths shown in your most recent log (i.e. nothing unexpected happening out of the blue), the only scenario that makes sense to me is that the J2EEFlexProjDeployable is returning the hierarchy of resources in an object that is still under construction. At some point after publishing this object in an unfinished state, it does eventually become complete. The latest wst.server.core jar should be able to confirm this. In addition to logging the WEB-INF/lib content for current resources object, it also logs the WEB-INF/lib content of the resources object used in the prior publish. I'm guessing that it won't be the same jars that were listed when the prior publish occurred. After a failed publish, continue to add the JSP, publish, remove the JSP and republish. Thanks.
On second thought, it might be good to do a publish before adding the JSP to see what the state of the resource objects are prior to forcing an update by adding the JSP. Hi Larry, is it possible that your latest changes actually modified the behaviour of the plugin in some way in addition to adding new log verbosity? Because we weren't able to reproduce the problem so far. Anwyay, we'll keep on testing... Hi Mauro, The extra logging could alter the timing of what's going on, but in theory, shouldn't alter the behavior. However, since I don't understand why the "mysterious" behavior is happening, then if it's due in part to timing issues, then behavior could be affected. I'll check is see if the change in reproducibility provides any clue as to what is going on. Thanks. Larry Have you considered the possibility that the extra memory usage of the logging is causing a finalizer to run that otherwise would have been left until after the operation? Created attachment 212480 [details]
My co-worker log of 2012-03-12
Hi Larry,
today my co-worker reproduced the problem at last! However, an error occured while doing the requested tests, I hope it doesn't invalidate the result.
Here is what happened:
- reproduced the problem: the file freemarker-unversioned.jar was missing from the deployed directory
- hit publish: OK!
- created a JSP and hit publish: KO! :-( The damned "Could not delete WEB-INF/lib" folder error message was shown! :-(
- deleted the JSP and hit publish: OK!
The final result was a web application with ONLY the Utility Projects JARs deployed in WEB-INF/lib (and the static web resources... I don't know if they were all or just a part). All the external JARs were missing from WEB-INF/lib, but only after the "could not delete..." error. Previously, when the problem firstly appeared, many JARs were there, but not the freemarker-unversioned.jar, as I mentioned.
I hope this log can reveal something nevertheless.
Created attachment 212627 [details]
My co-worker log of 2012-03-14
Here's another log obtained today, although once again it's not the "canonical" situation. This time things went in this way:
- stopped and restarted the server (or just restarted in debug mode, we're not sure about this)
- the published web application was empty (in the log there's the damned "Could not delete..." error)
- hit "Publish": the webapp came back in the published folder, except for the JARs of the Utility Projects
- added a JSP, hit "Publish"
- removed that JSP, hit "Publish"
- in the end, the published web application was still missing all the Utility project JARs, except for that of the project in which the JSP was added/removed
The strange thing is that in this log I can't find any clue about the fact that the JSP was added and then removed...
Created attachment 212773 [details]
My co-worker log of 2012-03-15
Here is another log of yesterday. This time the situation was the "canonical" one, that is the "Could not delete..." problem did not arise so it doesn't interfere with our diagnosis.
There's still no clue of the addition of the JSP in the log... I don't know if this is due your latest changes to the logging code or because of other reasons. Next time we'll better check this aspect too.
The situation depicted here is the "classical" one: started Eclipse, tried to start Tomcat. The web application was deployed during a previous Eclipse session, so no publishing should have been needed, but WTP removes just some JARs from the publish folder. As an example, consider the file freemarker-unversioned.jar, which we used as an example of a missing JAR.
(In reply to comment #44) > There's still no clue of the addition of the JSP in the log... I don't know if > this is due your latest changes to the logging code or because of other > reasons. Next time we'll better check this aspect too. Ok, we determined that it was our fault: since the "New JSP" wizard doesn't suggest the folder on which you issue the command as the folder in which the JSP has to be created (!!!), my co-worker inadvertently created the JSP outside the "WebContent" folder, so it was not published. Let me know if this vanishes the usefulness of the provided logs or not. If so, I'll try to provide you better logs the next time the issue arises. Created attachment 213160 [details]
My co-worker log of 2012-03-19
Hi Larry,
here is another log obtained on 2012-03-19. This is a "clean" case, i.e. no "Could not delete..." problem and this time we took care of creating the JSP in the right place, so that it has been published correctly.
In the log you'll see a couple of publishing attempts. The first one was made soon after the Eclipse IDE started. The second one is the most significant: you'll see many JARs were removed (the Spring JARs, for instance) and never put back, even after the usual steps ("Publish", JSP created+"Publish", JSP removed+"Publish").
I'm very sorry for the recent lack of response from me. I've been very swamped of late. Thanks for the additional logs. This most recent log illustrates the problem. Around the middle of the file the "### Tomcat Module Delta Publish #1" shows numerous jars being removed, among them numerous "org.springframework.*" jars. If you look above this in the log you will see "### WEB-INF/lib" and "### Prior WEB-INF/lib" entries. The "### WEB-INF/lib" entries show a subset of the "### Prior WEB-INF/lib" entries, e.g. "### WEB-INF/lib" entries don't include "org.springframework.*" jars and "### Prior WEB-INF/lib" entries do. The "### Prior WEB-INF/lib" show what WTP Server Tools thought it last published, so removing those jars is correct behaviour given this information. At the end of this publish, the object with provided the "### WEB-INF/lib" entries gets saved so it can become the object which provides the "### Prior WEB-INF/lib" entries in the next publish. In the manual publish that follows in the log, the "### WEB-INF/lib" entries now contain the "org.springframework.*" jars and I assume the other missing jars. The problem of republishing the removed jars now occurs because those missing jars also appeared in the "### Prior WEB-INF/lib" entries too. Since there is no difference between the two, WTP Server Tools thinks there is no need to publish any jars since they are listed as having been included in the last publish. The object that the WTP Server Tools saves to define what was previously published is getting updated, preventing the delta from being computed correctly. Since this is a change in behaviour from older versions of WTP, I'll need to redirect this bug to another component and between that component and Server Tools, figure out how to best fix this. I'll see about doing this and sending some e-mails tomorrow. Thank you very much Larry!! I'm glad to know some clue of the problem is now emerging. I have just a couple of additional questions: 1) if I understood it correctly, WTP is failing at updating the published JARs because of a mechanism that is using stale information saved in a previous publish operation; however, why is WTP detecting some JARs as removed on the very first publish operation, while they never left my workspace? Isn't this another problem? 2) what about the "Could not delete... May be locked by another process" problem that so often occurs to us? Do you have any idea on what may be causing it? Redirecting to Java EE since this problem involves the behavior of J2EEFlexProjDeployable which is being used via org.eclipse.wst.server.core.model.ModuleDelegate.members(). Something changed in Indigo that is causing problems for delta publishing. I'm not sure if the Server Tools delta publishing needs updating or this something that is going wrong in the Java EE side of publishing. Who ever is assigned to help look into this can e-mail me directly for information if desired. Sorry for not finding more cycles to invest in this sooner, but my day job has me very busy and going in other directions at the moment. Cheers, Larry (In reply to comment #48) > Thank you very much Larry!! I'm glad to know some clue of the problem is now > emerging. > I have just a couple of additional questions: > > 1) if I understood it correctly, WTP is failing at updating the published JARs > because of a mechanism that is using stale information saved in a previous > publish operation; however, why is WTP detecting some JARs as removed on the > very first publish operation, while they never left my workspace? Isn't this > another problem? Yes. However, there is the possibility that this may prove impractical to solve. WTP doesn't have a way of knowing about all the participants in the "pig race" to know when it is safe to publish. The risk of trying to force publishing to wait might be possible deadlocks or Eclipse being unusable until all initialization is complete. Fixing the delta publish so "synchronized" in the Servers view really means synchronized may be more practical to fix and may be more useful in the long run. > > 2) what about the "Could not delete... May be locked by another process" > problem that so often occurs to us? Do you have any idea on what may be causing > it? Tomcat on Windows is known for locking files while it is running. Actually it's the JVM and Windows that are the cause. This is why the Tomcat FAQ states that removing projects from the server while it is running is not supported. Normally only Tomcat would have jars down under "wtpwebapps" locked. However, if WTP attempted to delete them and failed, then it's possible that WTP may end up with locks on the jars. I can try to check and see what happens in the failed delete. I don't know if Java 7 offers any improved behavior in this area. Let me know if you are seeing this with an OS other than Windows. (In reply to comment #50) > > 2) what about the "Could not delete... May be locked by another process" > > problem that so often occurs to us? Do you have any idea on what may be causing > > it? > > Tomcat on Windows is known for locking files while it is running. Actually > it's the JVM and Windows that are the cause. This is why the Tomcat FAQ states > that removing projects from the server while it is running is not supported. > Normally only Tomcat would have jars down under "wtpwebapps" locked. However, > if WTP attempted to delete them and failed, then it's possible that WTP may end > up with locks on the jars. I can try to check and see what happens in the > failed delete. I don't know if Java 7 offers any improved behavior in this > area. Let me know if you are seeing this with an OS other than Windows. Actually, my co-worker is seeing this error often and he is working on Linux. Moreover, we are 100% sure that this problem arises also when Tomcat is NOT running, because we also thought about Tomcat being the problem and took care at observing this exact behaviour. Just a final note Larry. I would like to thank you very much for the effort invested here: I know it has been very expensive. I'd like just to ask you a favour, that is to put in a word for this bug to the other WTP team members assigned here, since we have been actually seeing this problem for long time now (well before the Indigo release, actually... we're sure the problem was there in Helios too and may be also in Galileo - I couldn't remember exactly) and this is getting really critical because it forces us to clean the server often: since we are working with a very large web application, any co-worker that does not have an SSD risks to lose dozen minutes waiting for full republishes to occurs, this even multiple times in a day. We experienced similiar issues. 1.) Windows 7 Prof. 64 Bit, Java 1.6.0_25 64 Bit, WTP Version: Version: 3.3.2.v201111030500-7O7IFj6EMjB7yO1Xs_G1kMtQeOye6HTXFWve95_R Build id: 20120210195245 2.) The Project is a large dynamic web project with 50.000 files in 8000 directories. Grown over 5 years from eclipse 3.1 to 3.7. 3.) We are not able to reproduce the error. Happened simply from time to time. 4.) We have a country/language dir structure in the WebContent- and Web-INF-directory. Sometimes the publishing removes files from the root of the webcontent folders and these files never come back until we do a clean republish. Issue appears independant from manual publish or automatic publishing. We had a lot of ideas and theories what triggers the issue. The last one is the idea that if a file is changed in the publishing directory from outside WTP during a publishing process, the process gets broken. We had a logfile which was located inside the webcontent dir. regards __horst, ec-logic.com Any feedback on this by the jst.j2ee devs? :-( Roberto - can you investigate? Not clear if problem on server side or within deployables. - If in deployables.. we may want Rob to help diagnose. There are a lot of comments in this Bug. Probably the one to start with is #47. To summarize, the object returned by org.eclipse.wst.server.core.model.ModuleDelegate.members() gets its data from J2EEFlexProjDeployable. Under some circumstances the object returned appears to be still under construction. The current state of this object is used to publish the server, which due to incomplete construction, ends up removing resources. In at least one of the logs, it appears that close to entire contents of the webapp was removed. This object, which is saved for delta calculation in the next publish, later finishes its constructionn (i.e. its state now differs from when it was used for publishing and caused deletion of resources). This behavior is confirmed in a number of the logs with respect to jars that go in the webapp's WEB-INF/lib. When the next server publish occurs, the delta calculation is incorrect (i.e. doesn't indicate deleted resources should be republished) because that object changed state after the prior publish. If this is known behavior for this object, then it isn't reliable for use in the delta calculation and the ServerTools needs to update the delta handling to use other means. However, it seems odd that the publishing can occur when this object isn't fully initialized. I'll be happy to answer any questions and provide more detail. Cheers, Larry Hello, I'm looking at the code, and I see this: 1. org.eclipse.jst.j2ee.internal.deployables.J2EEFlexProjDeployable extends org.eclipse.wst.web.internal.deployables.FlatComponentDeployable. 3. FlatComponentDeployable implements the members() method, which in turn delegates to FlatVirtualComponent the task of getting the resources. The strange part is that, by looking at the code, looks like all of this is done in the same thread. So, I do not see how calling J2EEFlexProjDeployable.members() would return an incomplete list. I guess this problem is hard to recreate outside the projects used by Mauro, right? If yes, then I will work in a jar with extra logging just as Larry did. Also, by looking at comment #52, if this is a regression, it is not new, it has existed for a while. (In reply to comment #57) > I guess this problem is hard to recreate outside the projects used by Mauro, > right? If yes, then I will work in a jar with extra logging just as Larry did. As with Larry, I'm ready to help you to gather extra logging information if you need. We can reproduce the problem quite easily. > Also, by looking at comment #52, if this is a regression, it is not new, it has > existed for a while. It's definitely not a regression. It was certainly there in Helios, too, and maybe also in Galileo. Juno release is approaching... is there any hope that some fix for this can go into it? :-( Created attachment 215110 [details]
org.eclipse.wst.web for WTP 3.3.1 with tracing
Created attachment 215111 [details]
source patch for previous plugin with tracing
My theory is that FlatComponentDeployable.members() is being called by one thread, and while em.fetchResources() is building the list of resources, another thread is calling FlatComponentDeployable.members() on the same instance, and retrieving the incomplete list. I attached a modified plugin org.eclipse.wst.web with some tracing, and the source patch of that modification. Please try to recreate, and search in the workspace log entries that begin with ###**** (In reply to comment #62) > I attached a modified plugin org.eclipse.wst.web with some tracing, and the > source patch of that modification. > Please try to recreate, and search in the workspace log entries that begin with > ###**** Thank you Roberto, I'll let you know as soon as possible. Do I need to add something to the .options file or not? (In reply to comment #63) > (In reply to comment #62) > > I attached a modified plugin org.eclipse.wst.web with some tracing, and the > > source patch of that modification. > > Please try to recreate, and search in the workspace log entries that begin with > > ###**** > > Thank you Roberto, I'll let you know as soon as possible. > Do I need to add something to the .options file or not? No,you don't. Just to add my report...I have the same issue everyone else does, where publishing mysteriously (and invisibly) halts midway through, then refuses to complete afterwards. Windows Vista 32-bit, Java 1.6, WTP 3.3.2. Just an update. After several days of tests, we were not able to reproduce yet after using the latest org.eclipse.wst_web_*.jar provided by Roberto. We haven't done intensive tests since we're busy on other things, but previously the error happened most of the times after Eclipse start. On the other hand, we're stuck at WTP 3.3.1 because of this issue and if Roberto has just applied the attached patch I can't explain why we couldn't reproduce until now. Maybe the additional log code is enough to provide a minimal synchronization that causes the problem not to show? We'll keep on testing. Created attachment 217066 [details]
Log produced on 2012-06-08
Hi Roberto,
at last today we reproduced the problem! It happened in the most frequent way, that is:
- WTP and Tomcat were working fine
- Eclipse was restarted because an Eclipse plugin update was installed
- after the restart, "start" was issued on the Tomcat server
- when Tomcat was starting up, the web app gave some "class not found" errors; looking at the publishing folder, many JARs were missing.
So, we went through the usual process requested by Larry:
- stopped Tomcat and saved the log file as it was
- hit "Publish" and saved the log file again
- added a new JSP, hit "Publish" and saved the log file
- removed the newly added JSP, hit "Publish" and saved the log file
The resulting log is the one I'm attaching one. Every phase is separated by a line starting with ######.
As you can see, there's no entry in the log starting with "###****".
Hi Roberto, did you have the chance to look at our latest log? I know the context is different, but may bug #369471 give us some ideas on what's going on here? Just wanted to comment that I'm experiencing this problem in Juno now. Is there an appropriate version of the debug build to run under Juno, or will the existing one work? (In reply to comment #70) > Just wanted to comment that I'm experiencing this problem in Juno now. Is > there an appropriate version of the debug build to run under Juno, or will the > existing one work? The verbose JARs provided here are for Indigo, they are not suitable for Juno. I would also need updated JARs for Juno if the investigation goes further, because we can't get stuck on Indigo any longer. I would appreciate some feedback from the WTP developers very much, since we have already invested quite some time on this debug already and this problem is hitting us since Galileo at least. It's VERY problematic when you work with huge web applications and you have a traditional mechanic hard drive to work with, since every clean+full republish requires several minutes to complete. I will try to add updated Juno jars by this weekend (July 29th). Created attachment 219292 [details]
Zip of Juno versions of modified plugins with tracing
Thank you very much Larry!! Now I can test things in Juno too. I'm sorry I have not updated this defect. I'm setting aside time this week for this defect, and I hope this Friday or next Monday to have a new patch for both Indigo and Juno Hi Mauro, how are you getting the log file? redirecting the console to a file? did you review the workspace log file? The patch I put originally should put an entry in the workspace log, and in the console. Hi Roberto, I'm running Eclipse as suggested by Larry, that is with "-debug" and "-consoleLog" command line parameters. These options are set in eclipse.ini. Then, I'm starting Eclipse redirecting the output to a file, that is: eclipse >g:\eclipse-output.txt My co-worker Davide is doing the same, but he's working on Linux, while I'm working on Windows. What I'm expecting is to see both the Eclipse log entries and any System.out.println() results into that file. Am I wrong? Created attachment 219598 [details]
Plugins with more tracing for Indigo
Hi,
I'm attaching two new plugins (generated for Indigo SR2) with more tracing. Again, you should see entries like this (starting with ###****):
!MESSAGE ###**** The thread Worker-9 is retrieving the list of members from org.eclipse.jst.jee.internal.deployables.JEEFlexProjDeployable@89e089e
If you do not see these entries in the console log, please check the .log file in the metadata folder of the workspace, or select Window -> Show view -> Error log in Eclipse, and select the option Export log.
Hi Roberto, unfortunately we can't use these packages for Indigo SR2. We have now switched to Juno, also thanks to the attached JARs provided by Larry. The alternative would be to use patched JARs against WTP 3.3.1, since we still have an old installation. But Juno packages would be really appreciated. Created attachment 220033 [details]
Plugins with more tracing for Juno
I added another zip with plugins with more tracing for Juno. This time, all the messages are logged using System.out, and only some are logged to both default output and the .log file. This patch is more verbose. If it was applied correctly, you will see entries like this: Sat Aug 18 18:16:20 CDT 2012###**** The thread Worker-2 is retrieving the list of members from org.eclipse.jst.jee.internal.deployables.JEEFlexProjDeployable@79757975 and org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@755f755f Please backup the original plugins, replace with this ones and recreate the issue. I know it was said somewhere in this bug, but could you summarize what your environment looks like, and what seems to be the way to recreate the problem? I'd like to try to setup a similar env and recreate the problem. Thank you Hi Roberto, I returned today at work after the summer holidays. Thanks for the patched JARs, I put them in my Eclipse Juno installation: as soon as I can reproduce the problem I'll let you know. The environments in which the problem is reproduced are heterogeneous, and I'm starting to wonder how it is possible that no WTP developers can reproduce (maybe because in your daily work you don't work with huge webapps ;-) Eclipse versions with which the problem is reproduced: Helios, Indigo, Juno (event Galileo? I can't remember). O.S. under which the problem is reproduced: Windows XP, Windows 7, Debian Linux, Ubuntu Linux. Eclipse distribution used: plain Eclipse Platform Runtime + WTP and many other pluygins - OR - Eclipse IDE for Java EE Developers bundle - OR - Spring Source Tool Suite (STS) bundle. We're now using the STS Gradle Integration plugin that adds a classpath container with all the JARs needed by your application collected from a Gradle build file and hence dynamically added to the project build path whenever the Gradle build file is changed and a manual "Update dependencies" command is issued on the project. Anyway, I can state for sure that this is not the source of the problem, i.e.: the problem happens even when NOT using the STS Gradle Integration plugin. I'm personally working with two different projects and reproducing the same problems. Both projects are made of a huge Dynamic Web Project depending on other Utility Projects (for a total of 6 projects in the simpler case, 25 projects in the more complex one). The Utility Projects are deployed together with the webapp projects as JAR files in WEB-INF/lib, using the standard WTP mechanism provided by the Deployment Assembly page. The JAR files, instead, are deployed either by having put them in WebContent/WEB-INF/lib directory or by putting a classpath container containing them in the Deployment Assembly page. In the simpler project setup, there are around 70 JARs deployed in this way (+ those of the Utility Projects), while in the more complex one there are around 170 JARs (+ those of the Utility Projects). The size of the whole published project is about 50 MB (with around 30,415 files) in the simpler case, 263 MB (with around 2,860 published files) in the more complex one. The symptoms I'm experiencing are foundamentally two: 1) sometimes, when WTP tries to clean one or more files from the publish folder, I get a dialog box warning me something like: "Could not delete <publish directory>/WEB-INF/lib. May be locked by another process.". This happens under both Windows and Linux, so it must not be a Windows-specific problem (that I know happens when a Java process tries to delete a file which is locked by another running process). I'm also 100% sure this does not necessarily happen when the application server (we're using Tomcat) is running, so it must not be this the source of the problem. When this problem happens, the publishing state of the web application gets corrupted, because WTP thinks the webapp is "synchronized", while the publish folder is usually empty or almost empty (because many files could be deleted, but not them all) 2) sometimes, when I try to publish/start my web application, I get some NoClassDefFound or ClassNotFound exceptions in Tomcat console; if I go to the publish folder to have a look, I can see that some JARs are missing, i.e. they are not published as they should, even if they were not deleted from the workspace. This more frequently happens after you start Eclipse and then start a Tomcat server which already has a published webapp from a previous Eclipse running session: no changes to the published state of the webapp should happen (because no changes were actually made in the workspace... also because Eclipse was not running!!!), but nevertheless WTP deletes some JARs from the publish folder before starting the server. As I said, this is the use case that most frequently causes the problem, but it is not the only one: sometimes it manifests itself even from a regular publish to another, within the same Eclipse running session. Even for this second problem, the publishing state is corrupted, because WTP thinks the webapp is "synchronized", but some JAR files are actually missing from its WEB-INF/lib folder. The only way to recover from both cases 1) and 2) is to do a "Clean" the server, so that the server publishing is made again from scratch. In some even more bad cases, I even had to remove the webapp from the server, publish, re-add the webapp to the server and then publish again. I can't say when/why sometimes even a full "clean" on the server was not enough. Considering the size and the huge number of small files those webapps are made of, you can immediately imagine how the need of so frequent full re-publishings is painful, especially when working with an only traditional mechanic hard drive: a single full publishing operation takes several minutes, even 10 or more. With the help of Larry we concentrated on problem 2), which I think is harder to track down. With his help we could determine that when the webapp is missing some JARs it's because WTP is thinking these JARs were removed, so they are correctly deleted by the Tomcat plugin from WEB-INF/lib. The problem is that those JARs should NOT be considered deleted by WTP. Or, even if a temporary concurrency situation might lead WTP to think they are, another event of addition immediately after that of deletion should follow, because the JARs never left in the source folders/classpath containers in the workspace. Please see the whole discussion and particularly Larry's conclusions to get a better diagnosis. After problem 2) is fixed, maybe searching in WTP source code for the "may be locked by another process" error message could help to debug for problem 1). The description above (comment 82 from Mauro Molinari) is exactly what we experience here : Eclipse Indigo on Linux/Windows, Tomcat 7.0.8 and WTP 3.3.2. For dependencies (retrieved through Apache Ivy) we're not using STS Gradle Integration plugin though, but IvyDE's containers. The "clean" operation on the server nearly never works. We're thus "remove-publish-add-publish" the Web Application each time... The fact that Tomcat is running or not is not relevant in this case as the problem arises in both cases. Our webapp made up with a "Dynamic Web Module" and several "Utility Modules". We hope that this very annoying bug will be squashed very soon, but we understand fully how difficult it is to trace back to its roots. Thanks by advance for your work. Created attachment 220769 [details]
My co-worker log of 2012-09-04
Hi Roberto, in these days we reproduced the problem.
I'm attaching the logs.
The ZIP contains:
- the output produced from Eclipse start up to the arising of the problem
- the output produced after hitting "publish"
- the output produced after adding a JSP and hitting "publish"
- the output produced after removing that JSP and hitting "publish"
- the output produced after cleaning the server
We are using your latest patched JARs, so Larry's output is missing, while I see entries with "###****" marks.
Hi Roberto, did you have the time to look at my latest log? Does it contain any useful information? Guys, I was experiencing this issue with my complex project and was able to find workaround. My project is just mostly empty dynamic web project with only libraries in it, also few other projects were added throw Deployment Assembly tool. I've managed to find out following cases: 1. All projects have deploypath "WEB-INF/lib" (screenshot1) - in this case clean operation fails "File may be locked by another process" error 2. One project has deploypath "WEB-INF/lib/<projectname.jar>" (screenshot2) - in this case clean operation succeeds, but publish operation fails with the same error 3. All project have deploypath "WEB-INF/lib/<projectname.jar>" (screenshot3) - in this case everything is working as supposed. BTW, deployment assembly in screenshot1 was correct and worked fine few eclipse releases ago. Hope this helps. Created attachment 222467 [details]
screenshot1 (all deploypathed incorrect)
Created attachment 222468 [details]
screenshot2 (one deploypath correct)
Created attachment 222469 [details]
screenshot3 (all deploypathes correct)
This can be a clue on the actual cause of the "File may be locked by another process", which is one of the problems described here. So, there may be some problems related to how resources are deleted/published when more than one project is declared with a deploy path of WEB-INF/lib. Note that old WTP versions were using this notation by default for dependent projects: deploy path=WEB-INF/lib. With recent versions, I saw that adding a project to the deployment assembly using the UI uses the notation deploy path=WEB-INF/lib/<projectname>.jar. I don't know the reason of this change. However, changing the org.eclipse.wst.common.component file, you can still set deploy path=WEB-INF/lib for projects. Other tools (like Gradle) also use the "old" notation, which has the advantage that it's not "binding" the deployed resource file name to the project name. Moreover, the current behaviour has the problem I described in Bug #365529. In any case, as I said we're using always just WEB-INF/lib as the deploy path of our projects and this only seldom gives the "File may be locked by another process" error. Moreover, as described here, we're seeing publishing problems even in cases in which that error does not occur. Hi Mauro, Sorry for the late response, and thank you for the detailed explanation on comment 82. I took a look at the logs attached in comment 84. I did not find the entries I was expecting to find, but in file ggts-wtp_1_initial.log, I found something interesting. But first, I'd like to confirm: That file is the log for when your co-worker started eclipse, and then hit the problem, correct? What I found interesting is this: 1. The very first line of the log with my flag (###***) says that the virtual component is starting to build the list of resources: Tue Sep 04 14:23:40 CEST 2012###**** Start caching resources for org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@1bccddb 2. The next two lines say that the virtual component for which the list is being built, is used to retrieve the members by thread Worker-28, which I suppose is the Tomcat thread. Tue Sep 04 14:23:42 CEST 2012###**** The thread Worker-28 is retrieving the list of members from org.eclipse.jst.j2ee.internal.deployables.J2EEFlexProjDeployable@136db31 Tue Sep 04 14:23:42 CEST 2012###**** The thread Worker-28 is retrieving the list of members from org.eclipse.jst.j2ee.internal.deployables.J2EEFlexProjDeployable@136db31 and org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@1bccddb 3. Almost at the end of the log, 8 seconds after the first line with my flag, you can see the line saying that the virtual component finished building the list of resources: Tue Sep 04 14:23:48 CEST 2012###**** END caching resources for org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@1bccddb I suspect that retrieving the members from a virtual component that is not fully initialized during eclipse startup might be the reason why the server adapter believes there are fewer jars in the application, causing the problem. I will work on the problem of using a component that is not fully initialized first. (In reply to comment #91) > I took a look at the logs attached in comment 84. I did not find the entries > I was expecting to find, but in file ggts-wtp_1_initial.log, I found > something interesting. But first, I'd like to confirm: That file is the log > for when your co-worker started eclipse, and then hit the problem, correct? Yes it is! > I suspect that retrieving the members from a virtual component that is not > fully initialized during eclipse startup might be the reason why the server > adapter believes there are fewer jars in the application, causing the > problem. I don't know if "Worker-28" is the "Tomcat thread" (maybe Larry could say something more precise on this subject), but your conclusion could explain why this problem mostly (although not exclusively) arises after Eclipse starts, i.e. on the first server start (which should rely on things that had got published during a previous Eclipse session). It could also explain why the problem seems to happen more frequently when using a huge web application and a slower hard disk. > I will work on the problem of using a component that is not fully > initialized first. If you need us to produce more debug output, could you please provide updated patched packages for WTP 3.4.1? With Webtools+PDT Juno versions we found a lot of key binding conflicts that are causing us some headaches, so we would appreciate the ability to upgrade those plugins to their Juno SR-1 versions... I managed to find a few cycles to look at this a little more. In my current development environment, I ran Eclipse and created a Tomcat server and a Dynamic Web Project. I then placed a breakpoint on the "if( members == null )" statement at the beginning of FlatVirtualComponent.getChildModules() and restarted Eclipse in debug mode. This breakpoint is hit on two separate worker threads. One is a job that initializes the Servers view. The other is a job from DecorationScheduler that is asking a for decorations for the Servers view via ServerDecorator. I believe along the lines of what Roberto is thinking, neither FlatVirtualComponent.getChildModules() nor FlatVirtualComponent.fetchResources() is thread safe. It is easy via stepping in debug mode to have an uninitialized array returned in one of the jobs while the other job performs the initialization. It has been my experience that trying to have one job block while the other finished may carry a risk of deadlock. Proving that such a deadlock can't happen is extremely difficult. I think the typical recourse in a situation like that described above is that if a second call to getChildModules() or fetchResources() occurs while an earlier call is still on going, then the second call should also perform the initialization so it can return valid results too, rather than return results early that haven't completed their initialization. Roberto, is this what you were thinking? I can take a crack at this if you are short on cycles. May the concurrency problem you describe be also the root cause of the "File may be locked by another process" error? I mean, could the two thread temporary lock files needed by each other, so that if this happens at the same time one of them then fails? I believe this code is simplely building a list of resources and not actually accessing any files, so it is unlikely to contribute to any file locking problems. I suspect the locked folders mentioned in other comments couldn't be deleted because they weren't empty and the error handling doesn't have the smarts to know why and assumes it's due to locking. The lack of thread safety might result in a corrupt list of resources that attempts to delete folders without first deleting their contents, which could lead to this symptom. Interesting... I realised now that the following was printed some minutes ago in my log file and it looks like some diagnostic output put by Roberto: !ENTRY org.eclipse.wst.common.modulecore 4 4 2012-10-31 12:51:59.448 !MESSAGE ###**** ERROR ERROR !!!! The thread main is caching while other is cachingorg.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@585fbaaf !STACK 0 java.lang.Exception: Thread caching while other is caching org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent@585fbaaf at org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent.cacheResources(FlatVirtualComponent.java:126) at org.eclipse.wst.common.componentcore.internal.flat.FlatVirtualComponent.getChildModules(FlatVirtualComponent.java:116) at org.eclipse.wst.web.internal.deployables.FlatComponentDeployable.getExportModelChildren(FlatComponentDeployable.java:276) at org.eclipse.wst.web.internal.deployables.FlatComponentDeployable.getModules(FlatComponentDeployable.java:287) at org.eclipse.jst.server.tomcat.core.internal.TomcatServer.getChildModules(TomcatServer.java:374) at org.eclipse.wst.server.core.internal.Server.getChildModules(Server.java:2576) at org.eclipse.wst.server.ui.internal.cnf.ServerContentProvider.hasChildren(ServerContentProvider.java:85) at org.eclipse.ui.internal.navigator.extensions.SafeDelegateTreeContentProvider.hasChildren(SafeDelegateTreeContentProvider.java:110) at org.eclipse.ui.internal.navigator.NavigatorContentServiceContentProvider.callNormalHasChildren(NavigatorContentServiceContentProvider.java:430) at org.eclipse.ui.internal.navigator.NavigatorContentServiceContentProvider.access$4(NavigatorContentServiceContentProvider.java:425) at org.eclipse.ui.internal.navigator.NavigatorContentServiceContentProvider$3.run(NavigatorContentServiceContentProvider.java:395) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.ui.internal.navigator.NavigatorContentServiceContentProvider.hasChildren(NavigatorContentServiceContentProvider.java:381) at org.eclipse.ui.internal.navigator.NavigatorContentServiceContentProvider.hasChildren(NavigatorContentServiceContentProvider.java:422) at org.eclipse.jface.viewers.AbstractTreeViewer.isExpandable(AbstractTreeViewer.java:2138) at org.eclipse.jface.viewers.TreeViewer.isExpandable(TreeViewer.java:588) at org.eclipse.jface.viewers.AbstractTreeViewer.isExpandable(AbstractTreeViewer.java:2176) at org.eclipse.jface.viewers.AbstractTreeViewer.updatePlus(AbstractTreeViewer.java:2858) at org.eclipse.jface.viewers.TreeViewer.updatePlus(TreeViewer.java:852) at org.eclipse.jface.viewers.AbstractTreeViewer.updateChildren(AbstractTreeViewer.java:2751) at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefreshStruct(AbstractTreeViewer.java:1923) at org.eclipse.jface.viewers.TreeViewer.internalRefreshStruct(TreeViewer.java:721) at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1898) at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1855) at org.eclipse.ui.navigator.CommonViewer.internalRefresh(CommonViewer.java:561) at org.eclipse.jface.viewers.StructuredViewer$8.run(StructuredViewer.java:1535) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1443) at org.eclipse.jface.viewers.TreeViewer.preservingSelection(TreeViewer.java:403) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1404) at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1533) at org.eclipse.jface.viewers.ColumnViewer.refresh(ColumnViewer.java:548) at org.eclipse.ui.navigator.CommonViewer.refresh(CommonViewer.java:350) at org.eclipse.wst.server.ui.internal.cnf.ServersView2$6.run(ServersView2.java:277) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:135) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:4144) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3761) at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$9.run(PartRenderingEngine.java:1029) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:923) at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:86) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:588) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:543) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:124) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:353) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:629) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:584) at org.eclipse.equinox.launcher.Main.run(Main.java:1438) at org.eclipse.equinox.launcher.Main.main(Main.java:1414) The strange thing is that I have updated WTP to Juno SR-1 meanwhile, so I would have expected that code to be no operative any more. Hi Roberto, any news here? Thanks in advance! Hey guys, any news? This last year we gave you as much details as possible in order to have a fix, but till now we had no valuable results. Please commit some effort on this issue... every developer have to clean and restart Tomcat several times a day just to work around this issue: that's really a PITA. Is there a chance we can get this fixed for Juno SR-2 (to be released in February 2013)? Roberto, did you see Larry's offering to take a further look at this? Sorry for not providing more feedback. I'm on try #2 but still struggling to find cycles. There is a problem in that there is API involved that doesn't really support thread safety. That's what went wrong with try #1 which would have required changing an interface that might be used by plug-ins outside of WTP. Sorry for the late response. It was a combination of a long period of really busy days and a period of vacation. Regarding Larry's comment #93, my initial approach was to add more locking/synchronization. But I agree with you: this code could be deadlock prone. But I'd like to know more details on what you are proposing. Are you thinking on invalidating the results of the first call in case a second call is done? or to have local copies for each call, ans the latest call would overwrite the results of the first call? Regarding comment #95, you are correct, this method only builds the list of resources. Does Regarding Mauro's comment #96, if this plugin did not change in SR1 (and it probably didn't), then it was not replaced when you updated WTP, and that's the reason you still see the tracing. And this output is a confirmation that two different threads were calling either FlatVirtualComponent.getChildModules() or FlatVirtualComponent.fetchResources() concurrently on the same instance. Larry, could you share more details on what you are trying? Another approach we could take is to investigate why the two worker jobs are using the same FlatVirtualComponent instance. Created attachment 224085 [details]
An attempt to deal with the thread safety issues in FlatVirtualComponent.
The idea I had for try #2 proved easier to implement than I had anticipated.
It tries to ensure that the "members" and "children" fields are only set with the most recent fully initialized objects. Thus, if thread "1" calls fetchResources(), and while it is "calculating" the resources, thread "2" calls fetchResources(). Then thread "2" also "calculates" resources. If thread "3" calls fetchResources() after thread "1" finished, but before thread "2" finishes, then thread "3" will "calculate" the resources too since it might be incorrect to use what thread "1" calculated. Each call returns the resources it calculated, but only the "last" concurrent thread gets to store its results in the "members" and "children" fields for future use. I don't know the code well enough to know if the results calculated from multiple threads could differ. To be safe, I assumed they could.
The instance of the FlatVirtualComponent is being cached by the FlatComponentDeployable that is using it, which allows fetchResources() and/or getChildModules() to be called from multiple threads when FlatComponentDeployable is called from multiple threads (via the getModules method for example). I don't know the code well enough to say if there could be additional thread safety issues related to FlatComponentDeployable. For example, FlatComponentDeployable.getFlatComponent() appears to be able to return null if FlatComponentDeployable.clearCache() is called after the "if" test fails but before the return. This would take a very precise and inopportune thread switch, which would seem very unlikey to happen. In any event, I hope this patch will address the main issue that is causing the problems for the Tomcat server publishing.
This patch is against current "master" in the webtool.common repository.
Created attachment 224086 [details]
A Juno (WTP 3.41) version of org.eclipse.wst.common.modulecore with patch applied
Hi Larry, I'm waiting for Roberto's feedback. If you both think it's useful for us to test your latest patched JAR, we may try to deploy it in our Eclipse installation and see how it behaves. Otherwise, let me know. I would say you could go ahead and try the jar. Effectively, it implements what Reberto suggested, ensuring that no two Jobs/threads use the same FlatVirtualComponent. It just does this internal to the fetchResources() and getChildModules() methods rather than trying to accomplish this externally. Prior to installation of the jar, it would be best to force a valid publish in your current environment and save a copy of any ".../tmp?/wtpwebapps/<project>(s)" that are publshed. Then install the jar and force a valid publish again and compare the results with the saved copies to confirm they are the same. I can't imagine how they would be different, but there could be some other issue in play that I'm not yet aware of. I will say the patch has its pluses and minuses. On the plus side, only FlatVirtualComponent is changed. No other classes need to change. On the minus side, thread safety is being added after the fact to only the fetchResources() and getChildModules() methods. There could be other thread safety issues still lurking, but I believe it will address the main one affecting server publishing. In theory, for WTP 3.5, it would be nice to update the code involved to be thread safe in general. I don't know code well enough to say if the consumers of FlatVirtualComponent should handle the thread safety issues so that FlatVirtualComponent doesn't have to, kind of like consumers of List and HashMap implementations need to provide the thread safety when needed. Updating FlatVirtualComponent to be generally thread safe would mean not performing initialization using instance fields, as it currently does. Based on try #1, I don't think it is possible to do this without changing API and/or method signatures that would require outside code to change as well. (In reply to comment #105) > Prior to installation of the jar, it would be best to force a valid publish > in your current environment and save a copy of any > ".../tmp?/wtpwebapps/<project>(s)" that are publshed. Then install the jar > and force a valid publish again and compare the results with the saved > copies to confirm they are the same. I can't imagine how they would be > different, but there could be some other issue in play that I'm not yet > aware of. Hi Larrry, my co-worker Davide and I are trying your latest patched JAR. First of all, we confirm published contents are the same (before and after dropping the patched JAR in Eclipse). We're now going to test and see if the problem arises again. Larry, thank you for the patch. I'm reviewing it and running the JUnits against it. I will post and update soon. Hi Larry, Hi Roberto, just to let you know that until now Larry's patch is behaving perfectly! We had no more publishing issues until we started testing it. We deserve to keep on testing for some other days, anyway Roberto meanwhile how is unit testing going? The code looks fine. Unfortunately, it breaks more than 30 JUnits, mostly in the org.eclipse.jst.j2ee.tests.bvt.EnterpriseBVT suite, mostly in the JCA and EAR operations. I need to investigate why they are failing with the patch. I ran then 3 times, and failed each time. I ran the suite 2 times without the patch, and the tests pass. Key to the patch is that the internal proxyComponent be able to duplicate the behavior that the "parent" component would have done. If there are use cases where the behavior is being augmented in a way not "seen" by the proxyComponent, that would cause problems for this patch. I would be willing to also take a look at the failing JUnits. Are there setup instructions needed to get them running? (In reply to comment #110) > Key to the patch is that the internal proxyComponent be able to duplicate > the behavior that the "parent" component would have done. If there are use > cases where the behavior is being augmented in a way not "seen" by the > proxyComponent, that would cause problems for this patch. > > I would be willing to also take a look at the failing JUnits. Are there > setup instructions needed to get them running? I do not know if they are documented, but you can download latest WTP build, and use it to clone the repositories http://git.eclipse.org/c/webtools-common/webtools.common.tests.git/ and http://git.eclipse.org/c/jeetools/webtools.javaee.tests.git/, and create JUnit plugin test run configuration for org.eclipse.wtp.j2ee.headless.tests.plugin.AllTestsSuite (for all tests) or org.eclipse.jst.j2ee.tests.bvt.EnterpriseBVT (for running the suite with more failures) Created attachment 224868 [details]
Second attempt to deal with the thread safety issues in FlatVirtualComponent.
The problem is that subclasses of FlatVirtualComponent may override some of the behavior of FlatVirtualComponent, in which case the proxyComponent won't duplicate that behavior. The only solution I could see that doesn't require a breaking change is to use the proxyComponent only when a FlatVirtualComponent is being used. I believe this is sufficient to fix the Tomcat publishing. I haven't looked for what other use cases which benefit from this change. With subclasses, the original thread unsafe behavior is used. It's hard to know what use cases remain as risk here. One use case involved in the JUnit failures was org.eclipse.jst.j2ee.internal.common.exportmodel.JavaEEComponentExportCallback.ConnectorExportComponent. I assume it is used when exporting projects as WARs or EARs. I would guess that it would be unlikely that you could get two exports using the same ConnectorExportComponent object, so I doubt thread safety is an issue here.
Thanks for the info on the JUnit tests. Even without the patch, I have yet to get the AllTestsSuite to run without a few errors and/or failures, which vary from run to run. Re-running a specific failed test so far has always worked. I seem to be able to achieve the same results with the new patch.
I'll try to attach an updated jar with the new patch later today.
Thanks for all your good work here Larry and others on this thread - I've been watching this.... Hi Larry, I look forward to test your new patched JAR as soon as it's ready. Meanwhile I can confirm once again that your previous attempt, for our own use case, was behaving perfectly: we're testing it all the day on at least 3 different systems and from when we started to use it we have never had any further publishing problem again. So we can say this problem is not mysterious any more :-) Created attachment 224923 [details]
Updated Juno (WTP 3.41) version of org.eclipse.wst.common.modulecore jar with second patch applied
This jar should still fix server publishing while not providing incorrect behavior for other use cases.
Larry, So you have patch for Indigo? Our whole team is currently experiencing the issue and we're not ready to move to Juno yet, because of other incompatible plug-ins. Is the patch going to be part of a maintenance release? Thanks! Phil. On my side, I can say that our tests even on the second patch by Larry are going well, no more publishing problems yet. Does the latest patch make the unit tests pass? Happy new year to everyone reading here! :-) (In reply to comment #117) > On my side, I can say that our tests even on the second patch by Larry are > going well, no more publishing problems yet. > Does the latest patch make the unit tests pass? > > Happy new year to everyone reading here! :-) Mauro, I'm glad to hear your testing is going well. I'm running the JUnits today, and I'm also requesting review to Chuck. I'll update the bug when I have the results. The patch looks good and safe. It compensates for possible subclasses, and the locking scheme should not create a deadlock situation. Let's get this in today if possible The JUnits ran fine with the second patch. Code committed and released to master for 3.4.2 and 3.5. Released using tag v201301092200. The commit can be seen here: http://git.eclipse.org/c/webtools-common/webtools.common.git/commit/?id=f8c2a81f32a845f62bac400ec282b4e9333eb5a9 Resolving bug Created attachment 225434 [details]
Updated Indigo (WTP 3.3.2) version of org.eclipse.wst.common.modulecore jar with second patch applied
Hey Larry, I just installed your latest patch on top of Indigo but I'm still getting an error. This happens for all the web apps deployed to TOMCAT. I simply copied the new jar to my eclipse env and restarted with eclipse -clean. Publishing failed Could not delete C:/ibmdev/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/com.ibm.sbt.sample.web/WEB-INF/lib. May be locked by another process. I'm getting this whenever TOMCAT is running or stopped. Let me know if there anything I can do to help. In theory, the jar should have overwritten the one already in your eclipse and using "-clean" is unnecessary. In fact, it is my understanding that "-clean" is no longer safe to use. I may delete configuration information that doesn't get rebuilt. It might be best to create a new Indigo SR2 installation, drop the jar into that one and use it with your current workspace. Anyway, as a first step, please confirm if the jar did or didn't overwrite one already in your eclipse. Hey Larry, Yes, the jar replaced the previous one. The new size is 371,462 while the old one was 380,099. If you're interested, I can share my env through in a remote conf. It would even be easier if you work with an IBMer, then we can use the internal tools. Given that you are using the patched jar, it would indicate you are experiencing a different problem. The fix addresses one specific issue and others may remain. It may be best to open a separate bug or post on the WebTools forum with details about what you are experiencing with the updated jar installed. It's unlikely I will find enough cycles at one time to try to do something remotely. I'll have to steal cycles here and there to see if I can figure out the source of your problem. I'm not sure it is a different problem, as it manifest exactly the same way. It might be a edge use case that your tests are not handling. Although, I don't believe we're doing anything fancy here. Can it be a side effect of another plug-in? We're using Indigo+IBM RTC+eGit. I do not consider myself this issue are being resolved/fixed, as actually the behavior remains the same after the patch. All I can say is that via modified jars with extra logging, Mauro was able to provide logs that pointed to what appears to be a specific thread safety issue which this patch fixes. Either that patch isn't getting used in your environment for some unknown reason, or you are experiencing different problem the same or similar symptom. The "back and forth" (like you find in this bug) needed to diagnose the cause of your symptom would be better handled in a separate bug rather than appended this bug. Just because this bug is marked fixed, it doesn't promise the end of all publishing problems. If you want to open a new bug, feel free open one against WTP Server Tools and the jst.server component. You will need to let me know what the Bug number is so I can assign it to me. Philippe, if you open a new bug, please put a reference to it here. On my side (with Juno + WTP 3.4.1 + patched JAR) I confirm no more problems. Maybe you can try with a fresh install of Indigo+WTP3.3.2+patched JAR + fresh new workspace, so that we are 100% sure that nothing else is interfering? Ok, I'll try that - but it might take a little while as we're focus on IBM Connect right now. Actually, I found the cause of the problem, and a workaround. Don't know if it deserves a new bug or not. My project has a set of dependencies (other java projects) in its deployment assembly list. But it is missing itself a WEB-INF/lib directory. So this lib directory is created during the deployment phase, and then locked, until a "clean" action is emitted to the server. Once I created a empty lib directory to the project, then the problem disappeared. After some months of usage of the latest released version of WTP, I would like to thank again all the people who worked on this, especially Larry. The publish problem on our case has disappeared and using Eclipse+WTP is now much more pleasant!! Philippe, did you open a new bug for your problem? If so, please post here a reference to it, thank you! |