Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 261104

Summary: [publisher] Versioning Categories
Product: [Eclipse Project] Equinox Reporter: Andrew Niefer <aniefer>
Component: p2Assignee: Ian Bull <irbull>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: achim.demelt, david_williams, dj.houghton, irbull, jeffmcaffer, pascal, pwebster, susan, thomas
Version: unspecifiedFlags: aniefer: review+
Target Milestone: 3.5 RC1   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 274356    
Attachments:
Description Flags
Adds version numbers to categories
none
patch to adds seconds to the category timestamp. none

Description Andrew Niefer CLA 2009-01-14 16:57:36 EST
Categories are currently generated without a Version.
Categories reference feature IUs using specific versions.

What are the semantics of the Category, how does UI use this version information?
What happens when multiple repos all have the same category, each with different contents in that category?

I believe Categories should be versioned if only to avoid having same IU + same version and different content.
Comment 1 Pascal Rapicault CLA 2009-01-14 23:10:52 EST
*** Bug 222784 has been marked as a duplicate of this bug. ***
Comment 2 John Arthorne CLA 2009-01-15 16:22:36 EST
Note that categories don't actually get installed into a user's profile so their lifecycle is much simpler than bundle IUs. I'm not sure what it would mean to have multiple versions of, say, the "C Development Tools" category. I suspect we would still want to collapse them and show a single category node in the UI with the category children combined.

Overall I see the motivation to avoid multiple IUs with the same ID/version, but it does introduce the problem of figuring out what version to give to a category at build time.

In the end I don't see that it buys us anything, but maybe I'm missing something...
Comment 3 Andrew Niefer CLA 2009-01-15 17:27:38 EST
The problem I see is more about just managing the repo/IUs.
Build 1 produces a category and publishes it in some repo.
Build 2 produces the same category and adds to it and wants to publish to the same repo.

It is fine to say the two are collapsed to a single category, but can the repo itself store both without them having different versions?

Given a set of repos, will queries return multiple ius for the same category if they all have the same version?  
Comment 4 John Arthorne CLA 2009-01-15 17:40:02 EST
> It is fine to say the two are collapsed to a single category, but can the repo
> itself store both without them having different versions?

This would be a problem, since the simple metadata repo stores the IUs in a map, and IU equality is based on id/version.

> Given a set of repos, will queries return multiple ius for the same category if
> they all have the same version?  

This case seems to work fine today. If I add multiple sites with the same category (say our M-build and I-build sites), it shows a single category with the combined children. We might just be getting lucky that categories are processed one a time though.

Overall I think you're probably right that we need to add a version. Perhaps we could do something simple like a timestamp since the actual value of the version isn't important, and it is just for the purpose of avoiding duplicate IUs with the same id/version.
Comment 5 David Williams CLA 2009-01-15 18:09:46 EST
I'm a little confused by this bug and discussion. Are you talking about categories that would, say, correspond to different releases of a set of features ... and show up on "install new software" type dialogs ... or, something that is stored and visible in the end-users workspace (so, for example, they could request a 'restore' to the previous version of a category? 

It attracted my attention because we've often found we had to have categories for releases, such as 
wtp 3.0.0
wtp 3.0.1
wtp 3.0.2
because under each category are too many features for end users to be able to say "show all versions" and then select all of them associated with one particular release. 

Or, are you talking more about some technical thing that doesn't effect end users views? 
Comment 6 Jeff McAffer CLA 2009-01-15 21:14:57 EST
There is a related concern.  When someone publishes something and somehow says that it should be in the Foo category, does that add it to some prexisting Foo or does this create a new Foo?  If the former, which pre-existing Foo is used?  If the latter, is this new Foo then merged in the UI with one/some/all pre-existing Foo categories?

Like John I agree that there is a nub of some issue here.  However, it feels like adding versions to the categories is just asking for trouble.

Perhaps another option is to mess with the category id.  That is, what if all categories had an id that included some repo identifier.  That way Foo from repo R would be say R|Foo v0.0.0.  Foo from Q would actually be Q.Foo v0.0.0.  the UI would be the same as today but would just trim the stuff before the | <or whatever your fav syntax is>.  This gets us the uniqueness of IU identity without the bitter after taste of multiple versions.
Comment 7 John Arthorne CLA 2009-01-16 08:19:22 EST
David, we're talking about the categories defined in site.xml that show up in the "available software" window. So, these are things that are shown to end users, but not installed. In update manager these were not versioned, but in p2 everything has a version so we currently use version 0.0.0 on all categories.

If you want different categories for different versions of your product/app/whatever, you can do that today by putting a version number in the category name.
Comment 8 Susan McCourt CLA 2009-01-19 15:18:33 EST
(In reply to comment #4)
> This case seems to work fine today. If I add multiple sites with the same
> category (say our M-build and I-build sites), it shows a single category with
> the combined children. We might just be getting lucky that categories are
> processed one a time though.

Today, the UI merges all content from categories that have the same id.  It does not consider version.  Even if we were to start using versions, the UI would likely continue to collapse all versions into one visible category per id.  I don't see exposing category versions to users.  

If we want to add a version for completeness, I think that's fine, but I don't see promoting the use of version categories until someone really needs it.  
Comment 9 Susan McCourt CLA 2009-01-19 15:21:17 EST
(In reply to comment #6)
> Perhaps another option is to mess with the category id.  That is, what if all
> categories had an id that included some repo identifier.  That way Foo from
> repo R would be say R|Foo v0.0.0.  Foo from Q would actually be Q.Foo v0.0.0. 
> the UI would be the same as today but would just trim the stuff before the |
> <or whatever your fav syntax is>.  This gets us the uniqueness of IU identity
> without the bitter after taste of multiple versions.

If repos want to adopt some kind of "qualified category" convention to distinguish their IU's, that's fine, but I don't like the idea of the UI stripping it.  What would be the value?

Comment 10 Jeff McAffer CLA 2009-01-20 21:38:16 EST
Summary of some of the discussion in yesterday's p2 call.  The id of categories is currently the human readable name. This makes it hard to manage uniqueness as well as do translations.  making the category iu id be a "normal" id as used everywhere else would help in that people could use different ids for their category but then have a property to capture the human readable name.  the name would then be translatable and the UI could coalesce the categories by name rather than id.

The topic of versioning the IUs did come up and honestly I'm not sure where we landed on it. At one point we seemed to get to "managing the ids would solve the problem" (see above) but right now I am not able to recreate that solution.  Someone else?
Comment 11 Pascal Rapicault CLA 2009-01-20 23:00:26 EST
At one point we had mentioned that the publisher would actually get the latest version of the category IU from the repo where it is publishing, create a new IU using the one found as a prototype, update th version, publish and remove the old IU.
This limited the number of categories available to limit confusion and avoided the mutable IU pb.
Does that ring a bell?
Comment 12 Susan McCourt CLA 2009-01-21 11:52:30 EST
I recall that we decided:
- ids can be qualified by repo if the repo wants to do this (I think this helps in some mirroring situations where there are conflicts?)
- the name property should be used for human readable name
- may as well support versions since in practice we'll end up with outdated category IU's in mirrors that will be superceded by a newer one

From a UI point of view, we already show a name if there is one in lieu of id.  However we currently merge by id, so this opens the door to seemingly duplicate categories if the names are the same and id's are different.
Comment 13 Thomas Hallgren CLA 2009-01-26 15:25:14 EST
I would reconsider the use of IU's to represent categories altogether. In my opinion, the publisher should be able to determine in what category or categories he wants to place the IU that is being published. The current approach will require either that "IU categories" are merged when installed or that the first found such IU blocks out other categories. The first approach violates the concept that an IU is identified by its name, namespace and version. The second approach defeats the purpose of categorization altogether.

Categories are there solely to assist the user during installation right? A category is not installable and it cannot be managed by p2. Why not use a well known property instead where the IU publisher can provide a comma separated list of category names?
Comment 14 Andrew Niefer CLA 2009-01-26 16:32:47 EST
The problem with tagging IUs with properties is that only the original producer can set what the category is.  ie, Equinox tags org.eclipse.osgi as "OSGi Runtime", consumers have no choice but to accept that, and can't include osgi in their "Cool Runtimes" category.

I believe the problem of blocking/merging categories is addressed by making categories have proper unique Ids and versions.  With a property for the human readable display name.  The UI then groups by display name.

eg: We produce a category
org.eclipse.jdt with display name "Java Tools"
You produce a category
org.cloudsmith.java with display name "Java Tools".

Both are display in the UI under "Java Tools" but do not interfere with each other.
Comment 15 Susan McCourt CLA 2009-01-26 17:43:40 EST
(In reply to comment #14)
> Both are display in the UI under "Java Tools" but do not interfere with each
> other.

This part of the proposal is not yet implemented by the UI so we need to coordinate that part while fixing this bug.  (It is already true that the UI will show the name of the category in the list if there is one provided in lieu of the id, but merging is currently done by id).

Comment 16 Susan McCourt CLA 2009-01-28 17:09:55 EST
(In reply to comment #0)
> I believe Categories should be versioned if only to avoid having same IU + same
> version and different content.
> 
Bug 260950 comment 11 demonstrates this exact problem.  
Comment 17 Thomas Hallgren CLA 2009-01-28 17:43:27 EST
(In reply to comment #14)
> The problem with tagging IUs with properties is that only the original producer
> can set what the category is.
>
True with the assumption that IU embedded tags rules out all other types of tagging.

> I believe the problem of blocking/merging categories is addressed by making
> categories have proper unique Ids and versions.  With a property for the human
> readable display name.  The UI then groups by display name.
> 
Which in essence is the same as having an IU that tags its requirements with a specific type of tag who's ID is subject to differences in NLS translation. This type of tagging is IMO very inflexible and it doesn't scale very well.

Why not add a tagging mechanism to the MetadataRepository that is separate from the IU's? That way, special handling for contradicting entities such as an uninstallable installable units would no longer be needed. No more conflicts and versioning problems. Thinking broader, the current categories is only one application where it could be put to good use. Far more elaborate filters based on tags and tag combinations could be added later on. Bug #262727 is one example where it would be very useful.
Comment 18 Jeff McAffer CLA 2009-01-28 20:10:07 EST
in general I am sympathetic to Thomas' points but the nice thing about "everything is an IU" is that "everything is an IU".  If we add a new notion then we need new queries, new API on the repo to find them all, new handling, new mirroring, new...  Then wehen we are busy writing all that people will say, why are we duplicating all the stuff we already have for IUs?  Why not just use IUs?  

I believe that fundamentally the current approach is the most flexible for all the seen and unseen usecases we have.
Comment 19 Thomas Hallgren CLA 2009-01-29 03:25:46 EST
(In reply to comment #18)
> ... need new queries, new API on the repo to find them all, new handling,
> new mirroring, new...  
> 
So instead, you implement (and enforce) special handling of one particular type of IU that is not really an IU but just a categorization of IU's. How is that superior to adding a clean tagging implementation? It seems to me you use more force trying coerce this 'category' IU into something that it isn't then you benefit from it being an IU. The result is an unclean solution that everyone that base their products on P2 must be aware of for all future to come.

But OK, let's make the "IU or not IU" a separate issue. A flexible tagging mechanism can be based on IU's (the 'categorizers' that Henrik mentioned in his posting to p2-dev). My main point is that the current solution is very specialized and that we could benefit largely from a more generic implementation.

Just to give an example:
An IU is tagged with the following tags, 'tools', 'database', 'linux', 'sql', 'gui', 'jboss'.

Imagine what we could do with an auto completion based filtering that is initialized with tags. You type in 'tools' and it would suggest new tags based on all IU's that have a 'tools' tag and more tags. As you continue adding new tags, the selection lessens. The order in which you enter the tags doesn't matter. At any time you can click OK and get the current selection of IU's.
Comment 20 Ian Bull CLA 2009-02-06 14:30:19 EST
Andrew, I am trying my hands at this bug, or at least adding the qualifier (since that does block Bug 260950, and it demonstrates a problem with our assumptions that IUs can be uniquely defined based on their name and version).

A few questions: Are categories only defined using site.xml?  i.e. does this only affect the SiteXMLAction?

I have attempted to fix this by storing the site URL in the SiteCategory.  Then, in the SiteXMLAction, I can:
 if (qualifier != null)
   cat.setId(qualifier + "." + categoryId); //$NON-NLS-1$
 else
   cat.setId(categoryId);

Does this seem reasonable? or were you thinking of something different here?
Comment 21 Andrew Niefer CLA 2009-02-06 16:06:24 EST
Ian, I had 3 things in mind:
1) give categories a proper id, and a human readable name
2) give categories a proper version
3) Have the UI group by human readable name instead of id

It looks like we already do (1), I don't know where the thought that we didn't do this came from.  This corresponds to 
cat.setId(categoryId);
cat.setProperty(IInstallableUnit.PROP_NAME, category.getLabel());
I don't think there is any need for an extra qualifier here.  When I look at the categories from the SDK I see for example:
<unit id='org.eclipse.platform.sdk.categoryIU'
<property name='org.eclipse.equinox.p2.name' value='Eclipse Platform SDK'/>

So I don't think there is anything to change from (1)

2) We need to decide what version to use here, it needs to be something other than 0.0.0

3) I don't know what the state of this is, I guess this is 260950?
Comment 22 Ian Bull CLA 2009-02-06 16:32:33 EST
(In reply to comment #21)

> 1) give categories a proper id, and a human readable name
> 
> It looks like we already do (1), I don't know where the thought that we didn't
> do this came from.  This corresponds to 
> cat.setId(categoryId);
> cat.setProperty(IInstallableUnit.PROP_NAME, category.getLabel());

So the problem (I think, Susan can correct me if I'm wrong) is that the Category  Id, is just the category name.  So if I create a category called "JDT" (v1) and you create a category called "JDT" (v1), when we gather up all the results (of say a query), these two IUs will be considered equal.  Then, when we look for the children of the category, one of these two will be suppressed.  What we should do is have
foo.com.JDT (with the Name JDT) and bar.com.JDT (with the name JDT).  These these two IUs will be unique, and we can combine them (because they have the same name, not id).

Numbers (3) should be easy, because we should just use the Name property.
Comment 23 Andrew Niefer CLA 2009-02-06 16:53:26 EST
The site.xml editor says:
"Provide a unique name, a label and a description for each category."
I was just treating name here as an id, so you wouldn't actually name it JDT, thats what the label is for.

But perhaps I am wrong and people aren't treating name as an id, we should look around for some examples.
Comment 24 Ian Bull CLA 2009-02-06 19:04:30 EST
(In reply to comment #23)
> The site.xml editor says:
> "Provide a unique name, a label and a description for each category."
> I was just treating name here as an id, so you wouldn't actually name it JDT,
> thats what the label is for.
If you look at something like the ganymede update site:
http://download.eclipse.org/releases/ganymede/

the categories just have names.  (no unique ID) for example:
<category name="Models and Model Development"/>

Comment 25 Thomas Hallgren CLA 2009-02-07 04:00:05 EST
Why do you want to put versions on categories? Isn't this just a mechanism for categorizing things in the UI? If so, why does this mechanism need to bother with versions?

Doesn't the content of the category change anyway depending on what meta-data repositories that are enabled? If it does, what is it that you want the different category versions to reflect?
Comment 26 Susan McCourt CLA 2009-02-09 13:12:26 EST
Sorry if I've caused confusion.  I made bug #260950 dependent on this bug because it's a specific case of the general problem:   today it is possible to have category IU's with the same id and version that aren't truly equal.  I was assuming we'd define the category IU semantics in this bug and then make any changes needed in PDE UI, p2 UI, update site generator, etc.

(In reply to comment #21)
> Ian, I had 3 things in mind:
> 1) give categories a proper id, and a human readable name
> 2) give categories a proper version
> 3) Have the UI group by human readable name instead of id
> 
> It looks like we already do (1), I don't know where the thought that we didn't
> do this came from.  This corresponds to 
> cat.setId(categoryId);
> cat.setProperty(IInstallableUnit.PROP_NAME, category.getLabel());
> I don't think there is any need for an extra qualifier here.  When I look at
> the categories from the SDK I see for example:
> <unit id='org.eclipse.platform.sdk.categoryIU'
> <property name='org.eclipse.equinox.p2.name' value='Eclipse Platform SDK'/>
> 
> So I don't think there is anything to change from (1)

The problem with #1 occurs with the update site generator, which doesn't try to use a unique id. The "Uncategorized" category generated for update sites is generated with the same id and version for all update sites, and we probably need to qualify the category id by repo location, so that each "Uncategorized" category has a unique id.  If this particular problem is to be solved separately from the general case, we should open another bug.  That is what Ian is focused on.

> 
> 2) We need to decide what version to use here, it needs to be something other
> than 0.0.0
> 
> 3) I don't know what the state of this is, I guess this is 260950?
> 
No.  Bug #260950 involves getting rid of UI collectors.  In trying to do so, it exposed the problem with the update site generator's category id.  We've always had this issue, but today the UI can detect and merge categories, and it would lose this ability to do so if we get rid of the collector.

I had planned to do 3) when this bug was fixed, but if Ian wants to fix the update site problem separately I'll do it then.


Comment 27 Andrew Niefer CLA 2009-02-10 12:05:25 EST
I have raised bug 264389 to support qualifiers on the category name.

For this bug, we should just allow the site action to be passed a version for the categories.

If no version is passed, we can leave it at 0.0.0.
Comment 28 Ian Bull CLA 2009-04-23 22:28:35 EDT
(In reply to comment #27)
> If no version is passed, we can leave it at 0.0.0.
> 
Do we want to leave it at 0.0.0, or 1.0.0.qualifier (where qualifier is the date / time-stamp)?


Comment 29 Andrew Niefer CLA 2009-04-24 10:33:31 EDT
I would suggest setting it to either 0.0.0.qualifier or 1.0.0.qualifier
Comment 30 Ian Bull CLA 2009-04-24 11:27:25 EDT
Andrew, do you know if there is a routine that will compute the date / time stamp?  I could do this, but there must be a static method somewhere in p2 for this.
Comment 31 Andrew Niefer CLA 2009-04-24 13:06:36 EDT
Look at org.eclipse.pde.build/QualfierReplacer.getDateQualifier

I don't know if there is anything in p2 to do the same.

Comment 32 Ian Bull CLA 2009-04-30 12:58:14 EDT
I'll grab this
Comment 33 Ian Bull CLA 2009-04-30 13:05:03 EDT
Created attachment 133970 [details]
Adds version numbers to categories

This adds an OSGi version to each category. The version is 0.0.0.qualifier
Comment 34 Pascal Rapicault CLA 2009-05-05 21:07:05 EDT
The code looks fine by me but I could not verify the behavior from the UI point of view (do we merge the categories?)
Comment 35 Susan McCourt CLA 2009-05-06 11:52:20 EDT
(In reply to comment #34)
> The code looks fine by me but I could not verify the behavior from the UI point
> of view (do we merge the categories?)
> 
We merge categories by name, but we won't truly be able to verify the proper category merging behavior until bug 274673 is fixed.
Comment 36 Andrew Niefer CLA 2009-05-06 17:56:50 EDT
Done, I also added an assertion to one of the pde.build tests on categories.
Comment 37 Ian Bull CLA 2009-05-11 14:39:16 EDT
I found a small problem with our versions.  We currently only supply qualifiers down to the minute.  If I publish two sites in success (with the same category), the two sites could have the same version number (Since I can publish two sites in less than a minute).  This is actually a bigger problem if we build scripts to publishing composite repos (since we will likely be processing a number of categories per minute).

I would like to propose we add seconds to the qualifier. I have done this, and it appears to be much better.
Comment 38 Ian Bull CLA 2009-05-11 15:03:21 EDT
Created attachment 135198 [details]
patch to adds seconds to the category timestamp.

This patch adds seconds to the category timestamp.
Comment 39 Ian Bull CLA 2009-05-11 16:56:45 EDT
We talked on the call today about using System.currentTimeMillis().  Since we are comparing strings, we will have to prefix with 0's, to ensure we have all 64 bits.  (Actually, I think enough time has passed since 1970 that this isn't actually true, but in theory we should).  Also, is a long 64 bits on 64 bit machines?  (In theory a long should be two words, and a word is the number of datalines, so on a 64 bit machine, ints would be 64 and longs would be 128, but this may not be true in practice).

I would suggest we leave this as year + month + day + seconds. If we really care, we can add milliseconds.
Comment 40 Andrew Niefer CLA 2009-05-11 17:39:35 EDT
I personally like a formatted time better than just the current millis.

I've released the seconds, with a change to use the same Calendar.getInstance() for each time unit.  This avoids the (small) possibility of:

at 1:02:59.5  get(MINUTE) -> 02
at 1:03:00.1  get(SECOND) -> 00

leading to 010200 instead of 010259, potentially getting a smaller version if we are doing more than one per minute.  Using the same instance will get the time at the first Calendar.getInstance() instead of the time at each line.

Comment 41 Ian Bull CLA 2009-05-11 17:43:40 EDT
Great catch. Thanks Andrew.
Comment 42 Thomas Hallgren CLA 2009-05-12 02:03:22 EDT
(In reply to comment #39)
> Also, is a long 64 bits on 64 bit
> machines?  (In theory a long should be two words, and a word is the number of
> datalines, so on a 64 bit machine, ints would be 64 and longs would be 128, but
> this may not be true in practice).
> 
One of the good things about Java is that the primitive types insensitive to architecture. No more 'words' and 'datalines'.

The specification states that a Java int is 32 bits and a Java long is 64 bits. There are no exceptions to these rules.
Comment 43 Ian Bull CLA 2009-05-12 14:29:51 EDT
*** Bug 274673 has been marked as a duplicate of this bug. ***
Comment 44 Susan McCourt CLA 2009-05-28 11:43:41 EDT
*** Bug 278208 has been marked as a duplicate of this bug. ***