Community
Participate
Working Groups
Imagine a corporate setting where a lot of developers might all start Eclipse and nail the server at the same time. I know Mylyn synchs on startup, but does it start the "20 minutes" at the point that the first synchronization stops? If it does, then that would at least build a little variation into everyone's clocks so that every 20 minutes the server does not get nailed again. Likewise, have you considered adding a small amount of fudge to the interval? Like some smallish random interval added to the 20 minutes. I'd suggest some number between 1 and 120 seconds to create some variation? It might be useful to also do this at startup, although personally I wish Mylyn just waited the 20 minutes before it did the first synchronization. Finally, can a repository provider specify a minimum interval? So that users cannot specify something like 1 minute as their interval and stress the server? Since Mylyn runs in the background, a user might not even realize how much they are hurting the server performance.
*** Bug 281894 has been marked as a duplicate of this bug. ***
Good idea, Mark! With regard to the mininum interval, a good way would be to mark every task with a last updated timestamp and avoid refreshing the task if the timestamp is less than some minimum amount old. This would also cope nicely with overlapping query results. Example: Query A returns tasks 1, 2, 3, whose details are refreshed. Query B returns tasks 2, 4, 6. If both queries are run shortly one after each other, task 2 would be refreshed in a short cycle. A timestamp would prevent the second refresh. An exception could be if the user explicitely selects task 2 and refreshes it.
> With regard to the mininum interval, a good way would be to mark every task > with a last updated timestamp and avoid refreshing the task if the timestamp is > less than some minimum amount old. This would also cope nicely with overlapping > query results. Bugzilla keeps a time-stamp for the last modification date of every task. > Query A returns tasks 1, 2, 3, whose details are refreshed. > Query B returns tasks 2, 4, 6. > > If both queries are run shortly one after each other, task 2 would be refreshed > in a short cycle. A timestamp would prevent the second refresh. That optimization is already in place. Only changed tasks are refreshed in a second pass after all Bugzilla queries have been updated.
I would suggest to change the summary to: "Randomize task list synchronization to reduce load on servers" This is related to the specific suggestions in the description.
One thing I'd like to avoid is this: [18/Aug/2009:09:29:17 -0400] "POST /bugs/buglist.cgi HTTP/1.1" 200 4319 "-" "Mylyn/3.2.0 BugzillaConnector Eclipse/3.5.0 (org.eclipse.epp.package.rcp.product) HttpClient/3.1 Java/1.6.0_14 (Sun) Linux/2.6.28-14-generic (i386; en_US)" [18/Aug/2009:09:29:18 -0400] "POST /bugs/buglist.cgi HTTP/1.1" 200 2760 "-" "Mylyn/3.2.0 BugzillaConnector Eclipse/3.5.0 (org.eclipse.epp.package.rcp.product) HttpClient/3.1 Java/1.6.0_14 (Sun) Linux/2.6.28-14-generic (i386; en_US)" The IP is removed to protect the innocent, but here the same Mylyn client submitted to different search queries (POST is different size) at the same time. For all we know, this person could be standing at the water cooler chatting with coworkers. It's unfair for this client to 'hog' the database with two 'low-priority' searches while the person using the Web UI is actually sitting at the computer waiting for their search results.
Denis, Just to be clear ... your complaint is that a single client submitted more than one query at essentially the exact same time? You would want to be sure Mylyn queues these up to run one at a time? I actually thought it did that already. I did not know it could make more than one hit to the server at the same time.
> Your complaint is that a single client submitted more than > one query at essentially the exact same time? You would want to be sure Mylyn > queues these up to run one at a time? Well, the typical web user doesn't, in all practicality, issue multiple searches simultaneously, and since most Mylyn searches are 'wasted' (ie, the client is not waiting for them, and the results yielded are mostly not used), it would be fair for Mylyn to behave more like the Web UI in this respect. Have a look at this behaviour. This is all from the same IP, and it has the exact same Mylyn/browser signature, using the Pulsar package. IMHO this is a fair amount of traffic for one client to be generating every 10 minutes, and eight searches from a single client is a large enough spike to affect performance for other users. 09:34:11 POST /bugs/buglist.cgi HTTP/1.1" 200 1061 09:34:12 POST /bugs/buglist.cgi HTTP/1.1" 200 3170 09:34:15 POST /bugs/buglist.cgi HTTP/1.1" 200 934 09:34:23 POST /bugs/buglist.cgi HTTP/1.1" 200 3075 09:34:25 POST /bugs/buglist.cgi HTTP/1.1" 200 1029 09:44:34 POST /bugs/buglist.cgi HTTP/1.1" 200 1137 09:44:35 POST /bugs/buglist.cgi HTTP/1.1" 200 3128 09:44:36 POST /bugs/buglist.cgi HTTP/1.1" 200 1260 09:44:37 POST /bugs/buglist.cgi HTTP/1.1" 200 3044 09:44:38 POST /bugs/buglist.cgi HTTP/1.1" 200 934 09:44:39 POST /bugs/buglist.cgi HTTP/1.1" 200 1029 09:44:40 POST /bugs/buglist.cgi HTTP/1.1" 200 1649 09:44:41 POST /bugs/show_bug.cgi HTTP/1.1" 200 2178 09:54:44 POST /bugs/buglist.cgi HTTP/1.1" 200 1070 09:54:45 POST /bugs/buglist.cgi HTTP/1.1" 200 3128 09:54:46 POST /bugs/buglist.cgi HTTP/1.1" 200 1260 09:54:47 POST /bugs/buglist.cgi HTTP/1.1" 200 3044 09:54:48 POST /bugs/buglist.cgi HTTP/1.1" 200 934 09:54:49 POST /bugs/buglist.cgi HTTP/1.1" 200 1029 09:54:50 POST /bugs/buglist.cgi HTTP/1.1" 200 1649 09:58:53 POST /bugs/buglist.cgi HTTP/1.1" 200 1138 09:58:54 POST /bugs/buglist.cgi HTTP/1.1" 200 3128 09:59:02 POST /bugs/buglist.cgi HTTP/1.1" 200 934 09:59:03 POST /bugs/buglist.cgi HTTP/1.1" 200 1029 09:59:05 POST /bugs/show_bug.cgi HTTP/1.1" 200 2465 09:59:15 POST /bugs/index.cgi HTTP/1.1" 200 3466 <-- not sure why posting to index? 09:59:16 POST /bugs/show_bug.cgi HTTP/1.1" 200 2050
At the annual Eclipse project release retrospective it was mentioned by our committers that bugzilla performance is slowing them down on their daily work and hence anything that will make bugzilla faster will be highly appreciated.
Sorry for increasing this to major, but as people are returning from holidays, our database servers are getting slammed with Mylyn/Bugzilla search queries. Take a look at these sobering facts. Our logs rotate at midnight, and it's 10:30am. bugs-vm1:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | wc -l 18241 bugs-vm1:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | grep -c "Mylyn" 16160 bugs-vm2:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | wc -l 12207 bugs-vm2:egrep "(GET|POST) /bugs/buglist.cgi" /var/log/apache2/bugs.eclipse.org/access_log | grep -c "Mylyn" 10304 26464/30448 (87%) searches were from Mylyn clients. Fortunately, since the upgrade to 3.4 I can send search queries to both our DB servers, but our CPU load is too high (372/400 in use). PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14504 mysql 15 0 3228m 2.7g 6148 S 372 18.0 9039:08 /usr/sbin/mysqld In the meanwhile, I'll have no choice but to write some Apache rewrites to limit/throttle the amount of buglist requests that Mylyn-signed clients can request. I hate to do this, but as you can see in comment 7, I think most of these queries are wasteful. Which error code would likely cause the least problems for Mylyn clients? 404 Not Found? 403 Forbidden? Gone? 500 Server Error?
Denis: We will review this on our call today and respond.
> In the meanwhile, I'll have no choice but to write some Apache rewrites to > limit/throttle the amount of buglist requests that Mylyn-signed clients can > request. The September rush back to work seems to have tapered off, so I don't need to investigate this now. I do appreciate your making this a P1 though, because the problem will surface again at some point.
(In reply to comment #11) > The September rush back to work seems to have tapered off, so I don't need to > investigate this now. I do appreciate your making this a P1 though, because > the problem will surface again at some point. Yes, its a concern and we'll be eliminating those redundant synchronizations for 3.3 (bug#239182).
Something else interesting that we could consider is using the IUserAttentionListener from the ActivityContextManager to reduce the synchronizations when the user is away from their computer (i.e. they leave Eclipse open over the weekend). E.g. if 2 scheduled synchronizations go by while there has been no attention (attentionLost), we suspend the task list synchronization job until the user comes back (attentionGained). This will reduce load on the server when the user isn't working and this also means that the user is more likely to see the task change notifications.
> to reduce the synchronizations when the user is away I love the idea. No sense in tossing 6 queries at us every 10 minutes if you're out having lunch. I think the UDC works this way too ... I leave Eclipse open all week and sometimes I don't use it for days. All of a sudden, then I bring it up to do something, I get the prompt to upload.
I wonder if it is possible to implement using "push" method instead of "pull" - with Mylyn using email notifications somehow.
(In reply to comment #15) > I wonder if it is possible to implement using "push" method instead of "pull" - > with Mylyn using email notifications somehow. Andew: We definitely want to move towards this, but need the corresponding WS API to exist in Bugzilla or an extension that can push changes. Could you please create a bug against us on this? We've been exploring some ideas around that. Since this is only an issue on very large, typically OSS, repositories, we were also considering making a white list of those repositories in order to alter their sync interval. However, if we can find a better solution we'd prefer not to degrade the performance for OSS repositories. The direction of this thread, ie, throttling the sync interval accordingly when the connection is idle, seems like the best approach for Mylyn 3.3 (Oct. 28th). I created bug 291237 for that.
In the user forum today, someone said that Mylyn queries the Bugzilla metadata configuration once per hour. If that is true, that seems way too often to me. Especially given that it can be manually refreshed. I do not know if it is true, but if so I would recommend changing that to something like once per Eclipse session.
(In reply to comment #16) > (In reply to comment #15) > > I wonder if it is possible to implement using "push" method instead of "pull" > > with Mylyn using email notifications somehow. > Andew: We definitely want to move towards this, but need the corresponding WS > API to exist in Bugzilla or an extension that can push changes. Could you > please create a bug against us on this? We've been exploring some ideas around > that. Alright, if you insist I created bug 291252 and a bunch of some other nasty ones. Thanks for listening.
(In reply to comment #17) > In the user forum today, someone said that Mylyn queries the Bugzilla metadata > configuration once per hour. If that is true, that seems way too often to me. > Especially given that it can be manually refreshed. I do not know if it is > true, but if so I would recommend changing that to something like once per > Eclipse session. For a short write up on how frequently configuration is retrieved, see http://wiki.eclipse.org/index.php/Mylyn/FAQ#Retrieval_of_repository_configuration
As I work with Mylyn I notice an increasing set of tasks accumulate inside the Mylyn Unassigned category. These are tasks that are not part of an existing query, so apparently are currently out of scope. Maybe it is a good idea to disable automatic synch for these
Happy New Year! :-D
bugs.eclipse.org has been getting lots of buglist requests from Mylyn clients since Monday, so as per comment 9 I put in place a way of refusing buglist requests from Mylyn clients should our database servers be hosed. Do note that the thresholds are quite high, so the impact of this should be very low -- or, lower than the impact of having web users wait for the DB servers. If anyone is interested, these are the Apache configurations: # If servers are too busy, prevent Mylyn buglists RewriteCond /path/to/a/file/called/mylyndisabled -f RewriteCond %{HTTP_USER_AGENT} Mylyn RewriteRule buglist.cgi / [L,R=400] I simply have a monitoring script that creates (and removes) the mylyndisabled file depending on the load on both DB servers.
Sounds like a reasonable work around for now Denis. We'll watch for weird beahvior.
To add a new tack on possible resolution to this problem, I suggest that repositories are allowed to define their own preferred refresh policies using a mechanism akin to robots.txt for use by web crawlers. In effect Mylyn is a special kind of web crawler, a bug crawler. The policy mechanism should allow settings based on user role (we are logged in as known user most of the time) so a reporter or assignee can refresh more often than a cc or even a bug lurker. Also the mechanism should allow different crawl interval based on bug settings, with new and open having more refresh than verified and closed. This policy engine should go into the general tasks framework, so it applies to all connectors. I like working on making Mylyn well behaved (bug 205708) so would be willing to work on this to supply patch.
I like this idea for the large publicly hosted repositories.
Mylyn has been restructured, and our issue tracking has moved to GitHub [1]. We are closing ~14K Bugzilla issues to give the new team a fresh start. If you feel that this issue is still relevant, please create a new one on GitHub. [1] https://github.com/orgs/eclipse-mylyn