Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 364929 - [stats] support customizable download statistics
Summary: [stats] support customizable download statistics
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 enhancement (vote)
Target Milestone: Juno M5   Edit
Assignee: Meng Xin Zhu CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-28 02:42 EST by Meng Xin Zhu CLA
Modified: 2013-11-10 22:32 EST (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Meng Xin Zhu CLA 2011-11-28 02:42:02 EST
P2 already has a simple download statistics[1] mechanism. But it's a way to do general downloading statistics, such as downloading number of specified binaries and downloading time. There is no way to stats additional information based on runtime information, such as the package type of eclipse, the host information that eclipse is running.

[1] http://wiki.eclipse.org/Equinox_p2_download_stats
Comment 1 Meng Xin Zhu CLA 2011-11-28 02:50:57 EST
A feasible solution is that writing the customized information into the profile. Profile can be initialized or updated later with property 'download.stats.additional'. For example, the jee package is materialized with arguments '-profileProperties download.stats.additional=package=jee',  then p2 will get url 'http://your.stats.server/stats/test.plugin.1.bundle?package=jee' for stats when downloading the artifacts.
Comment 2 Pascal Rapicault CLA 2011-12-01 20:52:29 EST
First thing first, what is the final goal? Spy the user, generate repositories on the fly, get usage stats? When do you need this information? Would it be something the foundation use? How does that intersect with the Eclipse UDC?
Comment 3 Meng Xin Zhu CLA 2011-12-01 22:15:02 EST
(In reply to comment #2)
> First thing first, what is the final goal? Spy the user, generate repositories
> on the fly, get usage stats? When do you need this information? Would it be
> something the foundation use? How does that intersect with the Eclipse UDC?
It's not intended to spy users, ;). It just collects the detail package information or other information interested by clients when users install new software on it. You know Eclipse foundation releases kinds of download packages for different purposes, such Java development, C/C++ development and Testing. I think it would be interesting if we have data for how many other softwares are combined with those basic release packages. We will know how much downloading of Groovy, Scala from Java developers, how much downloading of PHP, Perl, Python and other dynamic languages from Java or C++ developers.

And the implementation won't generate repositories on the fly, it depends on how the release engineers to create the installation package. When building jee package, an additional profile property(such as p2.stats.parameters='package=jee&os=win32&ws=win32') will be used when initializing the profile of jee Windows package. P2 will use this property as query string when requesting the stats url, which has been implemented by DJ. The stats servlet can understand which installation package is installing which new software.
Comment 4 Pascal Rapicault CLA 2011-12-02 08:51:49 EST
Just to be clear, is that something you think the foundation would use or something Windriver use :). 

At this point the mechanism proposed allows the person who has an update site to gather information about the environment of the user (for example from my site I could know that you are running jee package). Of course the provider of the site could be nice and also decide to get the information sent to the foundation but I doubt this would be the desire.

I'm not objecting, I just want to be clear on what it is we are trying to achieve (which level of details we want) because there may be better ways to do this, like for example: setting a user agent, having a specific tool that sends the eclipse config over (a toned down version of the UDC).
Comment 5 Helmut J. Haigermoser CLA 2011-12-02 09:20:42 EST
(In reply to comment #4)
> Just to be clear, is that something you think the foundation would use or
> something Windriver use :). 
> 
> At this point the mechanism proposed allows the person who has an update site
> to gather information about the environment of the user (for example from my
> site I could know that you are running jee package). Of course the provider of
> the site could be nice and also decide to get the information sent to the
> foundation but I doubt this would be the desire.
> 
> I'm not objecting, I just want to be clear on what it is we are trying to
> achieve (which level of details we want) because there may be better ways to do
> this, like for example: setting a user agent, having a specific tool that sends
> the eclipse config over (a toned down version of the UDC).

Hi Pascal :)
This is definitely a feature Wind River had good use for, but we want to create a universal mechanism for all interested parties to plug into.

Our use case is simple, we want to collect information like this:
- how many users install a certain product
- who installed a certain product
- what products are frequently updated, which ones are kept at base versions

All this information is important and is frequently used to decide on the lifetime of a product, i.e. if a certain product is not installed a single time, why should we keep maintaining it? As you can see this information is not Wind River specific, there is other parts we would plug-in as well if we had a good contribution model...

So, long story short, we have a need for good statistics, and we are willing to contribute a feature to p2 to help us get this feature implemented! :)

HTH,
Helmut
Comment 6 Wayne Beaton CLA 2011-12-06 17:19:00 EST
(In reply to comment #5)
> Our use case is simple, we want to collect information like this:
> - how many users install a certain product
> - who installed a certain product
> - what products are frequently updated, which ones are kept at base versions

The UDC captures this sort of information. Unfortunately, we've decided to get out of the UDC business based on a general lack of interest and specific inability for anybody to do anything useful with the data. The UDC server was shutdown in September and the UDC client has been pulled from the Juno packages.

A couple of years ago, I showed some queries that I was working on that determined--for example--how projects paired "in the wild" (e.g. users who use PHP also use Web Tools). A few folks said "hey, that's interesting" and then nothing came from it. This is a pretty common pattern with regard to the usage data: "hey that's interesting, but I don't really care".

I've given the data to countless researchers who have produced papers that contain almost no real value to the Eclipse community. I've given filtered subsets of the data to a few member companies who have then done nothing with the data.

In short, I'm very cynical about the value of the data in relation to the cost of collecting, maintaining, and disseminating it. The "is this still being used" question is an interesting one, but we can get that from the existing download stats. Or am I missing something?
Comment 7 Pascal Rapicault CLA 2011-12-06 20:09:36 EST
Hi Helmut, thanks for your input. The staffing is not really my concern since I know you guys will implement what is required as you've always done. I just want to be sure that the support we put in place matches the needs therefore my desire to understand those :)

In your list of functionality (repeated below with inline questions) you mention the term "product". Are you using this term to purposefully differentiate it from features?

>Our use case is simple, we want to collect information like this:
>- how many users install a certain product
  Do you care about initial install? How is that any different than what we have today in Eclipse?

>- who installed a certain product
  Does that mean user name? IP address? What else?

>- what products are frequently updated, which ones are kept at base versions
Comment 8 Meng Xin Zhu CLA 2011-12-06 22:00:46 EST
(In reply to comment #6)
> In short, I'm very cynical about the value of the data in relation to the cost
> of collecting, maintaining, and disseminating it. The "is this still being used"
> question is an interesting one, but we can get that from the existing download
> stats. Or am I missing something?
What we want to do is providing a capability of p2 to help webmaster of repository collect the detail download stats. P2 is widely used by a lot of applications based on Eclipse, it would be a good plus for the webmaster of p2 repository if they need download data. I just thought Eclipse foundation can profit from this enhancement as well, looks like the use case assumed by me has been done by UDC, and it's not needed now. Anyway it still provides the possibility to collect the download stats, the clients based on p2 can use it to stats download data.
Comment 9 Meng Xin Zhu CLA 2011-12-06 22:14:03 EST
(In reply to comment #7)
> In your list of functionality (repeated below with inline questions) you mention
> the term "product". Are you using this term to purposefully differentiate it
> from features?
The term "product" in our application is not "features", we use the special namespace to define the "product" iu.
> 
> >Our use case is simple, we want to collect information like this:
> >- how many users install a certain product
> Do you care about initial install? How is that any different than what we have
> today in Eclipse?
It doesn't have difference with Eclipse. The installation still contains collect, install and configure phases.
> 
> >- who installed a certain product
> Does that mean user name? IP address? What else?
The installer knows the user name who is installing our products. That's we are interested in how much installing from a specific customer.

I pushed the initial changes and test cases[1] to a new git branch, free to give me comments. :)

[1] http://git.eclipse.org/c/equinox/rt.equinox.p2.git/log/?h=stats
Comment 10 Helmut J. Haigermoser CLA 2011-12-07 04:40:17 EST
(In reply to comment #6)

> In short, I'm very cynical about the value of the data in relation to the cost
> of collecting, maintaining, and disseminating it. The "is this still being
> used" question is an interesting one, but we can get that from the existing
> download stats. Or am I missing something?

Hi Wayne :)
Considering your experience with the current statistics project and usage I think we should be sure to design a feature that will actually be useful and helpful, rather than collecting data that nobody wants and just sits in a database never to be looked at.

We here at Wind River have very specific use cases in mind for these statistics and how we want to use them so maybe we should start by looking at the feature from that angle?

Currently we offer the installation and update of content via p2 repositories which is easy and overall a great p2 success story. The statistics topic was introduced when product managers needed to decide about new features in our products. One criteria for planning the future of a product is knowing which ones of our products and updates are used by what customers. Products that never get installed or updated might not be worth investing time into, while products that key customers install and update would definitely require special attention and care. 

Right now we can collect some stats, we'll be able to tell PM what artifacts get downloaded and which don't, same for updates. But what we cannot do is tell them who installed them, what customer, with what contract etc. So, naturally we thought we would help make the statistics framework extensible so that we could plug into that and not only report the download of an artifact, but some other information as well:
- the customer name 
- information about the contract
- the OS type used (os:ws:arch of the installation profile so we know if we still need a solaris installer for example)
- maybe the metadata that was actually installed
- ...

We definitely want to make sure customers don't feel like they are spied upon, so transparency and information about the reasons for collecting data should be included in the feature from the start. 

Let me know what you think! :)
Helmut
Comment 11 Wayne Beaton CLA 2011-12-07 16:29:06 EST
(In reply to comment #10)
> Considering your experience with the current statistics project and usage I
> think we should be sure to design a feature that will actually be useful and
> helpful, rather than collecting data that nobody wants and just sits in a
> database never to be looked at.

That's good, because when we sat down to design the UDC, our main design point was to build something that was pointless and unhelpful. :-)

> So, naturally
> we thought we would help make the statistics framework extensible so that we
> could plug into that and not only report the download of an artifact, but some
> other information as well:
> - the customer name 
> - information about the contract
> - the OS type used (os:ws:arch of the installation profile so we know if we
> still need a solaris installer for example)
> - maybe the metadata that was actually installed
> - ...

Our privacy policy prevents us from collecting user names and other private information. What information are you hoping that the Eclipse Foundation can collect? 

Or is this information to be collected by your servers rendering privacy concerns as something between you and your customers?
Comment 12 Helmut J. Haigermoser CLA 2011-12-12 04:14:01 EST
(In reply to comment #11)
> (In reply to comment #10)
> > Considering your experience with the current statistics project and usage I
> > think we should be sure to design a feature that will actually be useful and
> > helpful, rather than collecting data that nobody wants and just sits in a
> > database never to be looked at.
> 
> That's good, because when we sat down to design the UDC, our main design point
> was to build something that was pointless and unhelpful. :-)
Right. In this we are not actually powered by engineers wanting to implement a nice technology, it's the potential users of this feature, product managers, who want us to implement it...

> 
> > So, naturally
> > we thought we would help make the statistics framework extensible so that we
> > could plug into that and not only report the download of an artifact, but some
> > other information as well:
> > - the customer name 
> > - information about the contract
> > - the OS type used (os:ws:arch of the installation profile so we know if we
> > still need a solaris installer for example)
> > - maybe the metadata that was actually installed
> > - ...
> 
> Our privacy policy prevents us from collecting user names and other private
> information. 
That's two sets of usernames we are talking about, we do have a website where customers need to register with a username/password to receive information on their specific products, we would want something similar with our installer: Enter username/password to get specific updates meant only for you or your company. 

> What information are you hoping that the Eclipse Foundation can collect? 
The root IUs actually installed would be good I guess, telling you what projects were actually installed, rather than what metadata repos were being accessed. OS information like the profile os/ws/arch property would be useful as well, dropping unused combinations could spare you support for unused Eclipse variants...

> Or is this information to be collected by your servers rendering privacy
> concerns as something between you and your customers?
We would like to use as much of the p2 framework as possible, I imagine we would bring up a separate screen for the user to acknowledge if we decided to collect private data, but at this point we are more interested in the public data, like the metadata that runs through p2's hands, and the mentioned username/password that has to be entered into the installer anyway. As said before, we want to make this as transparent as possible to avoid any concerns about us spying on users, I believe the best way to achieve this is to implement it in the open source, for everyone to look at and see...
Comment 13 Meng Xin Zhu CLA 2012-01-16 02:11:38 EST
Let me do a summary,

The proposed change of p2 will NOT impact the downloading behavior of eclipse even if there is a property 'p2.statsURI' specified in the repository. Which additional data can be collected while downloading artifacts from p2 repository is decided by release engineers of Eclipse. The variable additional data(such as package name and os detail) can be initialized by release engineers when building the Eclipse release packages. In a word Eclipse never collects any data without users' awareness.
Comment 15 Ian Bull CLA 2012-01-19 23:57:18 EST
Meng, can you update the wiki and send a note to the mailing list about this change. This seems pretty noteworthy and I want to make sure people are aware of it.