Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 339445

Summary: expose download information gathered from connector discover
Product: z_Archived Reporter: Robert Munteanu <robert.munteanu>
Component: MylynAssignee: Steffen Pingel <steffen.pingel>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: P3 CC: steffen.pingel
Version: unspecifiedKeywords: helpwanted
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
First parser implementation
none
Second version, compatible with Python 2.4 none

Description Robert Munteanu CLA 2011-03-09 17:50:27 EST
Cloned from: 337227: [api] add a download counter URL to the connector discovery
https://bugs.eclipse.org/bugs/show_bug.cgi?id=337227

For the MantisBT connector I would be interested in the download statistics related to the connector discovery. If needed, I can provide my own entry point for recording installations of the connector.
Comment 1 Steffen Pingel CLA 2011-04-03 21:21:47 EDT
We'll have to figure out where to track these. I'll probably set something up on mylyn.eclipse.org for now as it's tricky to evaluate the passed parameters on download.eclipse.org.
Comment 2 Steffen Pingel CLA 2011-04-13 14:37:21 EDT
Download information is now collected in the Apache logs of mylyn.eclipse.org. We would need some sort of the script to create reports. Should be fairly straight forward to record the hits in a database and to generate some graphs.
Comment 3 Robert Munteanu CLA 2011-04-13 15:20:27 EDT
Can you attach a sample log file with a couple of entries? Time permitting I'd be willling to take a shot at a log parser.
Comment 4 Steffen Pingel CLA 2011-04-13 15:38:38 EDT
pre.. 

# grep stats /var/log/apache2/access_log
w.x.y.z - - [11/Apr/2011:11:22:44 -0400] "HEAD /stats/mylyn/discovery/com.foglyn?id=com.foglyn&discovery=3.5.0&product=com.adobe.flexbuilder.standalone.product&os=win32&arch=x86&ws=win32&nl=en_US HTTP/1.1" 404 - "-" "Jakarta Commons-HttpClient/3.1"
w.x.y.z - - [12/Apr/2011:10:07:45 -0400] "HEAD /stats/mylyn/discovery/org.eclipse.mylyn.trac?id=org.eclipse.mylyn.trac_feature&discovery=3.5.0&product=org.eclipse.epp.package.jee.product&buildId=M20100909-0800&os=win32&arch=x86&ws=win32&nl=de_DE HTTP/1.0" 404 - "-" "Jakarta Commons-HttpClient/3.1"
w.x.y.z - - [13/Apr/2011:07:33:58 -0400] "HEAD /stats/mylyn/discovery/com.itsolut.mantis?id=com.itsolut.mantis_feature&discovery=3.5.0&product=com.springsource.sts.ide&buildId=2.6.0.201103161000-RELEASE&os=win32&arch=x86&ws=win32&nl=it_IT HTTP/1.1" 404 - "-" "Jakarta Commons-HttpClient/3.1"
w.x.y.z - - [13/Apr/2011:10:19:48 -0400] "HEAD /stats/mylyn/discovery/com.itsolut.mantis?id=com.itsolut.mantis_feature&discovery=3.5.0&product=org.eclipse.epp.package.cpp.product&buildId=M20110210-1200&os=win32&arch=x86&ws=win32&nl=ja_JP HTTP/1.1" 404 - "-" "Jakarta Commons-HttpClient/3.1"
Comment 5 Robert Munteanu CLA 2011-04-13 15:49:18 EDT
OK, thanks.

What is the preferred implementation language for the parser script?
Comment 6 Steffen Pingel CLA 2011-04-13 16:10:52 EDT
We have perl and python available on the server and could probably make php available as well. Your choice :).
Comment 7 Robert Munteanu CLA 2011-04-13 16:36:58 EDT
Perl/Python should be familiar enough. Is the database backend MySQL?
Comment 8 Steffen Pingel CLA 2011-04-13 16:41:28 EDT
Yes, MySQL.
Comment 9 Robert Munteanu CLA 2011-04-13 18:33:06 EDT
OK. two more items to clear up and I should be set to go:

* which pieces of information will be stored from the request log?
* should I take into account the same file being parsed twice e.g. by taking into account the last recorded timestamp?
Comment 10 Steffen Pingel CLA 2011-04-15 07:13:24 EDT
(In reply to comment #9)
> OK. two more items to clear up and I should be set to go:
> 
> * which pieces of information will be stored from the request log?

All parameters of the request. It would be nice if we could generate some statistics to show the operation system and language distribution etc.

> * should I take into account the same file being parsed twice e.g. by taking
> into account the last recorded timestamp?

That probably makes sense. The alternative would be to not parse the log but to record requests directly in a database.
Comment 11 Robert Munteanu CLA 2011-04-15 09:39:37 EDT
(In reply to comment #10)
> (In reply to comment #9)
> > OK. two more items to clear up and I should be set to go:
> >
> > * which pieces of information will be stored from the request log?
> 
> All parameters of the request. It would be nice if we could generate some
> statistics to show the operation system and language distribution etc.

OK, so all parameters from e.g. @com.foglyn?id=com.foglyn&discovery=3.5.0&product=com.adobe.flexbuilder.standalone.product&os=win32&arch=x86&ws=win32&nl=en_US@ will be stored . Anything else - IP address, user-agent? To make sure what I don't double parse I would need to store the date of the request anyway.

Additionally, in the request above, what is the difference between the the @com.foglyn@ in the request path and the one assigned to the id parameter?
Comment 12 Steffen Pingel CLA 2011-04-17 09:36:46 EDT
No need to store the IP address or user agent. We can freely choose the request path. Requests are mapped to different locations to make it potentially easier to use tools such as webalizer or awstats which we could also consider.
Comment 13 Robert Munteanu CLA 2011-04-19 17:51:31 EDT
Created attachment 193628 [details]
First parser implementation

First attempt at parsing the data using Python. This works just fine with the provided sample data. A few notes:

* the parser extracts all the fields, and skips previously recorded entries by timestamp;
* the columns are all nullable, as I am not sure which fields will always be set. In the provided sample file there is an entry with no build id;
* the script can be invoked using 'python src/DiscoveryParser.py' and it will run with the sample data and default MySQL connection credentials, but it is trivial to customize it and/or import it into another python file
* the code is also in a git repository at https://github.com/rombert/discovery-log-parser .

I have not added any license headers but I am happy to license it under the EPL. Do I only need to add the standard EPL header to the .py file? 

As for reading the data and presenting it in a web page, how should I proceed?
Comment 14 Robert Munteanu CLA 2011-05-18 15:49:46 EDT
Steffen, did you have the chance to try the script?
Comment 15 Steffen Pingel CLA 2011-05-21 08:23:34 EDT
Sorry for not getting back earlier Robert. We are in the midst of wrapping up the Indigo release. From taking a quick glance the script looks good to me. Is the idea to run it periodically (e.g. using cron)?

I got the following errors when running the script:

EE
======================================================================
ERROR: test_parseLines (__main__.LogParserTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_parse.py", line 7, in test_parseLines
    r.readFile('test/data/log.txt')
  File "/home/spingel/rombert-discovery-log-parser-a929a96/src/DiscoveryParser.py", line 16, in readFile
    self.parseLine(line)
  File "/home/spingel/rombert-discovery-log-parser-a929a96/src/DiscoveryParser.py", line 29, in parseLine
    request_datetime = datetime.strptime(ts[:-6], "%d/%b/%Y:%H:%M:%S")
AttributeError: type object 'datetime.datetime' has no attribute 'strptime'

======================================================================
ERROR: test_skip_already_parsed (__main__.LogParserTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_parse.py", line 24, in test_skip_already_parsed
    r.readFile('test/data/log.txt')
  File "/home/spingel/rombert-discovery-log-parser-a929a96/src/DiscoveryParser.py", line 16, in readFile
    self.parseLine(line)
  File "/home/spingel/rombert-discovery-log-parser-a929a96/src/DiscoveryParser.py", line 29, in parseLine
    request_datetime = datetime.strptime(ts[:-6], "%d/%b/%Y:%H:%M:%S")
AttributeError: type object 'datetime.datetime' has no attribute 'strptime'

We only have Python 2.4.2 available on the server. Maybe that's the problem?

For the front-end a python or perl script that aggregates data from the database with some simple visualization, e.g. though http://code.google.com/apis/chart/ would probably work?
Comment 16 Robert Munteanu CLA 2011-05-21 16:16:34 EDT
(In reply to comment #15)
> Sorry for not getting back earlier Robert. We are in the midst of wrapping up
> the Indigo release. From taking a quick glance the script looks good to me. Is
> the idea to run it periodically (e.g. using cron)?
> 

That's fine, I'm just eager to see the data :-) . Yes, the script is designed to be run from cron, dumping only the new data in the database.

> AttributeError: type object 'datetime.datetime' has no attribute 'strptime'
> 
> We only have Python 2.4.2 available on the server. Maybe that's the problem?

Yes, it most likely is. That method is available from Python 2.5 . I'll submit an update which _should_ work on Python 2.4 shortly. I only have Python 2.7 to test for now.

> 
> For the front-end a python or perl script that aggregates data from the database
> with some simple visualization, e.g. though http://code.google.com/apis/chart/
> would probably work?

Yes, it would. Do you prefer a script which generates on demand , i.e. accesibly from a browser , or a cron script which pre-generates the data?
Comment 17 Robert Munteanu CLA 2011-05-21 16:45:27 EDT
Created attachment 196278 [details]
Second version, compatible with Python 2.4
Comment 18 Robert Munteanu CLA 2011-08-18 10:43:59 EDT
Steffen, it would be nice if you found the time to re-test the script. I would then start working on the visualisation.
Comment 19 Steffen Pingel CLA 2011-08-22 19:13:15 EDT
Thanks for remiding me. The second patch must have escaped the radar. I'll try to get around that next week after the 3.6.2 release.
Comment 20 Steffen Pingel CLA 2013-08-12 13:06:46 EDT
Robert, I'm very sorry for dropping the ball on this. We have meanwhile lost the mylyn.eclipse.org server and hence the logs. I have setup something on mylyn.org where we can now see installs on a per connector/version basis: http://stats.mylyn.org/awstats.pl . It looks like Mantis is the second most popular open source connector :). I hope that offsets the lack of responsiveness on this bug a little bit.
Comment 21 Robert Munteanu CLA 2013-08-13 05:33:44 EDT
That's great Steffen, thanks!