This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 199818 - [api] arbitrary attributes in AbstractTask
Summary: [api] arbitrary attributes in AbstractTask
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: 3.0   Edit
Assignee: Steffen Pingel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-13 21:50 EDT by Eugene Kuleshov CLA
Modified: 2008-05-14 23:24 EDT (History)
3 users (show)

See Also:


Attachments
proposed implementation (16.43 KB, patch)
2007-08-19 12:47 EDT, Eugene Kuleshov CLA
no flags Details | Diff
mylyn/context/zip (13.20 KB, application/octet-stream)
2007-08-19 12:47 EDT, Eugene Kuleshov CLA
no flags Details
updated patch (26.58 KB, patch)
2007-10-20 16:22 EDT, Eugene Kuleshov CLA
no flags Details | Diff
updated patch (21.87 KB, patch)
2007-10-22 00:19 EDT, Eugene Kuleshov CLA
no flags Details | Diff
mylyn/context/zip (17.50 KB, application/octet-stream)
2007-10-22 00:19 EDT, Eugene Kuleshov CLA
no flags Details
mylyn/context/zip (46.22 KB, application/octet-stream)
2008-05-13 22:41 EDT, Steffen Pingel CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Kuleshov CLA 2007-08-13 21:50:14 EDT
AbstractTask should allow to store arbitrary attributes. The most notable use cases for this:

-- store info for the task highlighters. bug 199817
-- store selected attributes that could be used by connector-specific presentation providers (i.e. group by component or target milestone)
Comment 1 Eugene Kuleshov CLA 2007-08-19 12:47:49 EDT
Created attachment 76386 [details]
proposed implementation

added attributes to the AbstractTaskContainer, support in the task list externalizer and additional test cases
Comment 2 Eugene Kuleshov CLA 2007-08-19 12:47:50 EDT
Created attachment 76387 [details]
mylyn/context/zip
Comment 3 Steffen Pingel CLA 2007-08-21 13:47:05 EDT
I have a few concerns with the implementation in the patch:

- For each item in the task list a HashMap is allocated even if no meta data is stored increasing the memory footprint of the task list. Could the attribute field be lazily initialized?
- There is no thread-safe way of setting attributes. I think all added methods should synchronize on a lock.
- The only way to set an attribute is by calling addAttributeValue() which will increase the size of the values list on each call. I would expect a setAttribute() method that clears the list and only adds that value.
Comment 4 Eugene Kuleshov CLA 2007-08-21 14:12:13 EDT
Good points. I'll use ConcurrentHashMapupdate and will initialize it lazily. Though lazy init won't really matter, once connectors will start storing stuff in there and Bugzilla connector will be the first in the line.
Comment 5 Eugene Kuleshov CLA 2007-08-21 14:31:19 EDT
Few notes from the conf call today:

-- it is important to allow to store metadata within task list data. Support backup, import/export, exchanging this data between users and also make sure that metadata is always in sync with the task data without additional code in connectors.

-- several major use cases: metadata specific to a single connector (because current schema is not flexible enough to support arbitrary task trackers); metadata specific to class of connectors (i.e. project + category for issue trackers are quite common property and it could be used by some common presentation in the Task List); global metadata (i.e. tags, planing info, task sequences, external linking and synchronization with the the 3rd party tools)

Concerns:

-- store too much in the task list xml
-- task list data vs. offline task data
-- public API
Comment 6 Mik Kersten CLA 2007-08-24 12:29:47 EDT
Let's chat about this again during the next conf call (added to agenda).  Setting for 3.0 because we know we need some solution to handle this API use case.
Comment 7 Mik Kersten CLA 2007-10-12 18:57:05 EDT
Sorry that I still haven't had a chance to respond to this.  Let's make sure to dedicate up to a third of the next call to a design discussion and then figure out how to proceed. 
Comment 8 Eugene Kuleshov CLA 2007-10-20 16:22:35 EDT
Created attachment 80828 [details]
updated patch

This patch incorporates requested changes
Comment 9 Steffen Pingel CLA 2007-10-21 20:01:52 EDT
Thanks for the updated patch. Here are my notes:

- if a value is retrieved that does not exist no additional memory overhead should be impossed, the current implementation will lazily create the map in that case
- it is not safe to call getAttributes() from multiple threads, it needs to be synchronized to avoid creating the attribute list more than once
- why is getAttributes() public?
- the since tags should be updated to version 2.2
- the patch contains unrelated changes for TaskList.java and it has also a lot of whitespace changes (which makes it difficult to review the semantic changes)

To address Mik's concerns of bloating the task list the size, the XML file could be checked during loading if its size exceeds a certain threshold (1 MB?) causing the loader to ignore the extended attributes. 

Storing multiple values per key makes the synchronization more difficult and also requires more memory and makes accessing values slower. Eugene, could you list use-cases where multiple values per key are required?
Comment 10 Eugene Kuleshov CLA 2007-10-22 00:19:52 EDT
Created attachment 80848 [details]
updated patch

(In reply to comment #9)

Generally, I don't like direction where this review process is going and how long does it take to get half dozen api methods in. My humble goal is to ensure that api is sufficient for known use cases. And as far as I am concenred, we are pretty much done here. Other clever people could fix the performance/memory issues after getting some profiling data (remember the 1st rule of premature optimizations - don't optimize). Anyways, I don't think I can spend any more time on this and this will be my last patch, so if someone else want to cary this thing on, please go ahead.

> - if a value is retrieved that does not exist no additional memory overhead
> should be impossed, the current implementation will lazily create the map in
> that case
> - it is not safe to call getAttributes() from multiple threads, it needs to be
> synchronized to avoid creating the attribute list more than once

Ok.

> - why is getAttributes() public?

To allow getting attributes. See the externalizer code.

> - the patch contains unrelated changes for TaskList.java and it has also a lot
> of whitespace changes (which makes it difficult to review the semantic changes)

I have multiple patches in my workspace and can't always control what team cvs provider includes into the patch. Please use task context to ignore the unrelated changes.

As for the whitespaces, I am using formatting to keep my code consistent with Mylyn formatting and it is not possible to keep others inconsistent code untouched.

> To address Mik's concerns of bloating the task list the size, the XML file could
> be checked during loading if its size exceeds a certain threshold (1 MB?)
> causing the loader to ignore the extended attributes.

-1 for the reasons discussed on the conf call. If you believe that further discussion is required, I'd suggest a separate ug report.

> Storing multiple values per key makes the synchronization more difficult and
> also requires more memory and makes accessing values slower. Eugene, could you
> list use-cases where multiple values per key are required?

For the use cases see comment #5. Personally, I can live or work around storing multiple values within single string entry.
Comment 11 Eugene Kuleshov CLA 2007-10-22 00:19:56 EDT
Created attachment 80849 [details]
mylyn/context/zip
Comment 12 Steffen Pingel CLA 2007-10-22 00:58:41 EDT
 (In reply to comment #10)
> Generally, I don't like direction where this review process is going and how
> long does it take to get half dozen api methods in. My humble goal is to ensure
> that api is sufficient for known use cases. 

I very much agree with that goal but I also want to make sure the implementation is correct and the API consistent, concise and easy to understand. In the past we have been quick to adopt API changes and merged patches with much less review. Often times that required subsequent fixing and API cleanup before major releases. Now that our user community and integrators community has grown significantly we have adjusted our development process towards stabilazation. I think thorough reviews and discussions are a good way to ensure that we maintain a clean API and don't destabilize but I am open to discuss other ways than time consuimg reviews to achieve that.
 
> I have multiple patches in my workspace and can't always control what team cvs
> provider includes into the patch. Please use task context to ignore the
> unrelated changes.

Okay. Please note that this does affect my motivation to review patches and will slow down the process even more.

> As for the whitespaces, I am using formatting to keep my code consistent with
> Mylyn formatting and it is not possible to keep others inconsistent code
> untouched.

The code in the repository should always be consistent with the formatting rules. It might be worth while to investigate automatted solutions to ensure that that is the case.
 
> > To address Mik's concerns of bloating the task list the size, the XML file
> could
> > be checked during loading if its size exceeds a certain threshold (1 MB?)
> > causing the loader to ignore the extended attributes.
> 
> -1 for the reasons discussed on the conf call. If you believe that further
> discussion is required, I'd suggest a separate ug report.

As far as I remember that solution was not discussed on the call and it seems to be a better approach than other suggestions that were made.

> > Storing multiple values per key makes the synchronization more difficult and
> > also requires more memory and makes accessing values slower. Eugene, could you
> > list use-cases where multiple values per key are required?
> 
> For the use cases see comment #5. Personally, I can live or work around storing
> multiple values within single string entry.

I am not sure how to interpret that answer. I can't derive the requirements for multiple values from comment #5. Since single values per key make the implementation much easier I would suggest to start with that and extends the API later if a use-case arises.
Comment 13 Eugene Kuleshov CLA 2007-10-22 01:13:13 EDT
(In reply to comment #12)
> ... I think thorough reviews and discussions are a good way
> to ensure that we maintain a clean API and don't destabilize but I am open to
> discuss other ways than time consuimg reviews to achieve that.

With all due respect, I don't observe "through review" on this bug report. It is been two months since first patch (api proposal) been submitted and there is no obvious progress, except your comments on implementation, statement about lack of motivation and suggestion to chat about this in comments #6 and #7 that are almost two months apart from each other.

Last comment made me even more disappointed, especially given the importance of this enhancement for the extensibility.

PS: if I am not mistaken, Rob gave an argument against checking the attribute size that it won't stop user to stick multiple attributes with smaller values to workaround such check. So, like I said, that it does not make sense to pollute api discussion with such implementation detail, and it better to discuss it in a separate bug report.
Comment 14 Steffen Pingel CLA 2007-10-22 01:59:43 EDT
 (In reply to comment #13)
[...]
> Last comment made me even more disappointed, especially given the importance of
> this enhancement for the extensibility.

Yes, it would be good to get this enhancement into the next release. I have commented on each of the posted patches within two days. To me the patch is still incomplete (e.g. there are no tests for removeAttribute() or clearAttributes()) and I'll wait for other opinions how to proceed further.

> PS: if I am not mistaken, Rob gave an argument against checking the attribute
> size that it won't stop user to stick multiple attributes with smaller values to
> workaround such check. So, like I said, that it does not make sense to pollute
> api discussion with such implementation detail, and it better to discuss it in a
> separate bug report.

That was my argument. I don't see an obvious way to work around the check of the total size of the task list during load or save.

There a few open questions that haven't been answered but might have been discussed in calls or might be documented else where (please just point me to the bug/wiki page):

- How are the extended attributes populated? Is that part of the connector plug-in (not sure it is a good idea to do that)? 
- It should also be possible for other plug-ins to provide extended attributes: How is the data kept synchronized? How can these plug-ins hook into the current synchronization procedures?

Comment 15 Eugene Kuleshov CLA 2007-10-22 02:25:07 EDT
(In reply to comment #14)
> To me the patch is still incomplete (e.g. there are no tests for removeAttribute() or
> clearAttributes()) and I'll wait for other opinions how to proceed further.

See comment #10

> There a few open questions that haven't been answered but might have been
> discussed in calls or might be documented else where (please just point me to
> the bug/wiki page):
> 
> - How are the extended attributes populated? Is that part of the connector
> plug-in (not sure it is a good idea to do that)?

That is use case specific.

> - It should also be possible for other plug-ins to provide extended attributes:
> How is the data kept synchronized? How can these plug-ins hook into the current
> synchronization procedures?

Good question. My assumption was that AbstractTask instances are not recreated during synchronization, so contributed attributes don't get erased/dropped. If this is not the case, that need to be fixed too.
Comment 16 maarten meijer CLA 2007-10-22 09:00:04 EDT
 (In reply to comment #10)
> As for the whitespaces, I am using formatting to keep my code consistent with
> Mylyn formatting and it is not possible to keep others inconsistent code
> untouched.
Perhaps you can raise a bug against the code formatter for this, as it appears a quite common use case.
option: only format code that I touched.
Comment 17 Eugene Kuleshov CLA 2007-10-22 10:17:58 EDT
(In reply to comment #16)
> Perhaps you can raise a bug against the code formatter for this, as it appears
> a quite common use case.
> option: only format code that I touched.

Just FYI. The screwups most likely came from the numerous merges I had to do on these classes.

PS: please let's not fade discussion on this bug report to stuff that is not related to the proposed api
Comment 18 Mik Kersten CLA 2007-10-23 12:26:50 EDT
Eugene: I'm not OK with the statement "I don't like direction where this review process is going and how long does it take to get half dozen API methods in".  This is the most core part of our API, the easiest to abuse, and given the total discussion on this bug report and conference calls to date has actually had less design discussion than most API things this core.  Steffen is putting effort into the review, and I can't see anything wrong with the questions that he has raised beyond the fact that they are slowing things down.

Please consider that if we get this wrong, we will have a framework/API *huge* problem on our hands.  For example, if connectors start storing a significant amount of things  from the Attributes section of a bug report in these key/value pairs we've made a very bad mistake.  In that case we should have instead provided constant time access to the objects in the attributes section.  That's very straightforward to do (e.g. keep a JavaModel style cache of all repository task attributes, max it at 10MB, and then presentations will be free to render things like products/components at will).  I'm not saying this is necessarily the right approach, but right now we need to pause this effort and I want a very clear listing of the use cases to figure out what the custom attributes will hold.  My main question is to figure out whether we should provide API access to all repository task attributes first and cache those in memory, in order to prevent integrators from using this mechanism for that, thereby forcing each connector implementor to have to update the String-based attributes whenever a task changes.
Comment 19 Eugene Kuleshov CLA 2007-10-23 13:02:53 EDT
(In reply to comment #18)
> Eugene: I'm not OK with the statement "I don't like direction where this review
> process is going and how long does it take to get half dozen API methods in". 
> This is the most core part of our API, the easiest to abuse, and given the
> total discussion on this bug report and conference calls to date has actually
> had less design discussion than most API things this core.  

After last meeting I've got an impression that everyone agreed how the API should look like, hence my comments. Apparently I was mistaken.

> Steffen is putting effort into the review, and I can't see anything 
> wrong with the questions that he has raised beyond the fact that they 
> are slowing things down.

I don't have problem with questions. What I have problem with is the lack of progress. I realize that you are afraid to change or introduce new API, and yes I do understand that there is certain risk, but yet something need to be done to allow extensibility, and waiting and posponding decision on that will only make things worse. Repeating warnings about danger of adding such extensibility is not helping progress either.

> Please consider that if we get this wrong, we will have a framework/API
> *huge* problem on our hands.

I think you too missing the point. The problems you refer to for the most part are the implementation details bound by implementation of the current task list store backend.

> I want a very clear listing of the use cases to figure out what the custom
> attributes will hold.

What is unclear with the use cases listed in comment #5?

I believe it is been discussed few times already, but it is not sufficient to provide access repository attributes. First of all repository attributes are being erased by connectors during synchronization, so non-connector attributes will be lost. Also connector already have to update string based attributes in AbstractTask upon synchronization, so we not introducing anything new here. Also those attributes are not part of the export/import/backup data.

BTW, presently available API also vulnerable to the sizing issues. For example, anyone can create category and add 100k tasks or subcategories in it, or create  an AbstractTask instance and add 1000k children in it.
Comment 20 Mik Kersten CLA 2007-10-24 15:23:49 EDT
(In reply to comment #19)
> I think you too missing the point. The problems you refer to for the most part
> are the implementation details bound by implementation of the current task list
> store backend.

Unless someone turns out with the very significant resources needed to get rid of those limitations, they are a key constraint.  So we need to focus on what's realistic and not on what's possible.  But as discussed on the call, it's not actually the space constraints I'm as worried about as an API that's confusing to integrators because it requires them to understand three separate data structures (AbstractTask, RepositoryTaskData and custom attributes), and worse yet makes them manually synchronize between custom attributes and RepositoryTaskData on every synch.

But I agree that we need a "little less conversation and a little more action" (http://en.wikipedia.org/wiki/A_Little_Less_Conversation ).  What I'm unwilling to have us do, at without a *much* more careful overall design, is create String-based extensibility for storing RepositoryTaskData contents because I believe that it should be readily accessible from AbstractTask (comment#18).  For cross-connector contents what we can do right now is provide an internal class for capturing custom attributes that exposes the map, and a method along the following lines on AbstractTask:

   CustomTaskAttributes internalGetCustomAttributes()
   
Eugene: we can make these changes based on your last patch and you can review before they are finalized.  First we need the following things:

1) A test for each method that you would eventually like to be API, i.e. your current additions to AbstractTask.  Since this is destined for internals, consider just exposing a Set so that you don't need separate methods for get/set of single/multiple values.  Also, get methods that return the first attribute are problematic.  Sets have the benefit that most clients won't have to bother checking for duplicate values (unless there is a use case for duplicate values).

2) We need a concrete set of use cases for this and at least a description of an example client.  The use cases need the following properties:
* Cannot be connector-specific, this is the realm of RepositoryTaskData which we will be improving
* Cannot be the tag use case, since if we support tags they will be a primary part of the AbstractTask API (i.e., will have methods)
Comment 21 Eugene Kuleshov CLA 2007-10-24 16:24:01 EDT
(In reply to comment #20)
> Eugene: we can make these changes based on your last patch and you can review
> before they are finalized.  First we need the following things:

Not sure if you are asking me to do that, but already said in comment #10 that I won't be able to work on further patches for this issue. I really believe that one can't design something up front and it is much more efficient to prototype simple possible solution and then morph that solution to solve issues discovered in the prototype.

Also, I don't think "direct access to repository task data" is a good idea, because consumers would have to know if data is managed by connector, or if they should use a separate API to access non-repository data and that is sounds like a bad abstraction to me. It will also get into the way of tools that need to work across multiple repository types. Besides, like I already mentioned, connectors already have to synchronize attributes in AbstractTask, including owner, change date, completed status and few others. 

All in all, I do agree that this synchronization and related API need to be reviewed (as I already said before, I think it was bad decision to have two parallel editable structures for task data), but I think it is out of scope of this particular issue.
Comment 22 Mik Kersten CLA 2007-10-24 17:49:28 EDT
Eugene, could you provide an answer to point (2) of comment#20?  Here's an example of one use cases I have thought through:

* An attribute on tasks that needs to be used by a planning/reporting tool that builds on Mylyn and needs to provide cross-repository planning facilities.  For example, a tool created for consultants to help with their tracking and billing could allow the consultant to annotate each task with the billing cycle that the task was involved in.  The reason that this planning tool would set this attribute in the Task List, and not in a separate file that will participate in import/export (e.g. .mylyn/planner.xml.zip) is that the provider of the planning tool standardized on an attribute format that can be used by another reporting tool that interoperates with a (not yet existing) facility for exporting Task List data to CSV.
Comment 23 Eugene Kuleshov CLA 2007-10-24 18:28:55 EDT
(In reply to comment #22)
> Eugene, could you provide an answer to point (2) of comment#20?

I believe those are quite obvious and generally I don't feel comfortable explaining all my use cases to representatives of the commercial tool vendor.

Anyways, out of the top of my head (it is really better to put on the wiki somewhere):

* Connector-specific attributes that need to be used by connector-specific presentation, for example grouping or model specific to a given connector or class of connector. One presentation/grouping that I miss a lot is group by repository + project and repository + project + component/category.
  I bringing this here because user may have to maintain local classification for repositories that don't support it in the model (essentially tag tasks locally).

* Attributes used by object-based action contribution and property testers (actions in the Task List driven by the presence of given attribute/property).

* Attributes to be used for connector-specific and connector-neutral decorations (tags, due dates, project, component, etc)

* Tagging is actually both, connector-specific and connector-neutral, because there could be a local tags, as well as repository-managed tags (i.e. JIRA and Google Code support tags natively, and keywords in Bugzilla and Issuezilla are tags in some sense).
  Tagging implementation based on common attributes gives an interesting advantage - it is easier to maintain several parallel classifications. For instance: tag-based project/component hierarchy for repositories that don't support it in the model (i.e. web connector or local tasks); GTD-like tagging/triaging and other parallel classifications that could co-exist.

* Various forms of linking features that 3rd party integrations could use. I.e. link task to particular use case or feature description on the wiki; link task to email or forum conversation; link task to version control changes. Those don't fit too well into the repository task data for the same reasons you outlined in your own use case.
Comment 24 Eugene Kuleshov CLA 2007-11-02 14:42:30 EDT
FYI. I am dropping patch submitted for this issue from my workspace.

BTW, it is almost two weeks since last comment on the bug report, so it doesn't seem like suggested approach is working at all.

I'd like to propose some alternative and suggest to use agile and incremental approach to address this issue. So, the proposal is to take the contributed patch (maybe simplify it to allow only single value per key), then build some functionality on top of it (i.e. rewrite highlighters support, implement custom tagging and add support for project/component into couple connectors). After that it will be easier to refactor this provisional api into something you guys consider appropriate, but in a mean time early adopters will be able to play with new functionality and immediately benefit from it.
Comment 25 Eugene Kuleshov CLA 2008-01-08 14:03:49 EST
Can someone clarify what is the strategy for getting this feature into Mylyn? For instance, I think that use cases for this issue are orthogonal to ones from bug 210814, so that issue should not be a dependency for this one.
Comment 26 Eugene Kuleshov CLA 2008-01-22 11:06:22 EST
Just to clarify. This bug is separate from bug 210814 and would enable attaching custom data not available from issue tracking repositories (i.e. stuff stored in the task data).
Comment 27 Mik Kersten CLA 2008-01-22 17:47:56 EST
This bug is not entirely orthogonal in that comments above identify that there are common use cases.  For example, storing a common "product" has been used as one of the key drivers and I still believe that it is important that we do not support using AbstractTask to store any RepositoryTaskData info (i.e. no connector-specific info). There are two other related API changes that I want us to consider as becoming first class in the API for 3.0 instead of using extensible String fields for them.  

1) Personal Planning information, e.g. AbstractTask.getPersonalPlanningInfo(): a class to encapsulate all fields and dates on the Planning page.  Extensibility use cases:
* Add a "client" field for a consultant-specific tool
* Add a "respond by" date for additional planning support

2) People, e.g. AbstractTask.getPeople(): a class to encapsulate all the collaborator fields on a task.  May have a separate shared and personal/local instance.  Extensibility use cases:
* Add "supervised by"

3) Tags, e.g. AbstractTask.getTags(): still some open questions (e.g. task 168363)

4) Associations, e.g. links to attachments, URLs, etc

Eugene: it would be helpful if you could list each of the use cases that you have which do not fit into 1-3, as well as the ones that do.  Some of your use caes in comment#23 were too generic for me to understand.  (3) and (4) above should cover some of the ones that you did list.
Comment 28 Eugene Kuleshov CLA 2008-01-22 19:10:32 EST
(In reply to comment #27)
> ... I still believe that it is important that we do not
> support using AbstractTask to store any RepositoryTaskData info (i.e. no
> connector-specific info). 

Agreed. this is covered by bug 210814

> There are two other related API changes that I want
> us to consider as becoming first class in the API for 3.0 instead of using
> extensible String fields for them.  
> 
> 1) Personal Planning information, e.g. AbstractTask.getPersonalPlanningInfo():
> a class to encapsulate all fields and dates on the Planning page. 
> Extensibility use cases:
> * Add a "client" field for a consultant-specific tool
> * Add a "respond by" date for additional planning support

Good use cases, but it hard to guess what other use cases may need to be supported for planning and in result PersonalPlanningInfo would need to provide support of storing arbitrary attributes too.

> 2) People, e.g. AbstractTask.getPeople(): a class to encapsulate all the
> collaborator fields on a task.  May have a separate shared and personal/local
> instance.  Extensibility use cases:
> * Add "supervised by"

A good one. Another useful one could be "ignored commenters"

> 3) Tags, e.g. AbstractTask.getTags(): still some open questions (e.g. task
> 168363)

This is one I am also very interested with. It would be really good to replace backend of the current context highlighting feature using this. I think string attributes would be sufficient to prototype it and get some mileage without betting on the api.

> 4) Associations, e.g. links to attachments, URLs, etc
> 
> Eugene: it would be helpful if you could list each of the use cases that you
> have which do not fit into 1-3, as well as the ones that do.

Unfortunately I can't disclosure exact use cases because of the vendor NDA, and I can only say that string attributes and fast access to task data would be sufficient.

By the way, there could be more then one vendor integration supporting particular feature, for example "client" or "respond by". So it would make sense to prefix vendor-specific stuff with the vendor's own name space and such scenario would be hard to support using strongly typed api.
Comment 29 Eugene Kuleshov CLA 2008-01-22 19:34:25 EST
(In reply to comment #20)
> Eugene: we can make these changes based on your last patch and you can review
> before they are finalized.  First we need the following things:
> 
> 1) A test for each method that you would eventually like to be API, i.e. your
> current additions to AbstractTask.  Since this is destined for internals,
> consider just exposing a Set so that you don't need separate methods for
> get/set of single/multiple values.  Also, get methods that return the first
> attribute are problematic.  Sets have the benefit that most clients won't have
> to bother checking for duplicate values (unless there is a use case for
> duplicate values).

In my understanding it been suggested to not support multiple values, so the API would look something like this:

  String getAttribute(String key);
  void setAttribute(String key, String value);
  void removeAttribute(String key);
  Map<String, String> getAttributes();  // this cover getting attribute names

Alternatively, getAttributes() could be mutable, then get/set/remove methods won't be needed, but the rest of Mylyn API returns immutable collections.

I can take another iteration on additions to the AbstractTask (actually AbstractTaskContainer) and corresponding test cases if that would help.
Comment 30 Steffen Pingel CLA 2008-01-22 21:06:44 EST
Are there other examples of Eclipse APIs that offer extensibility in a similar way? 

The closest I can think of are cheat sheets. ICheatSheetManager allows to store key/value pairs of type String that are persisted through mementos. That allows extensions to store additional state. The major difference to the task list though is that these properties are bound to the life-cycle of the cheat sheet and discarded when the cheat sheet is restarted or completed whereas it is unclear to me how to manage that for the task list.

I want to throw another idea into the discussion: Instead of allowing arbitrary attributes tasks core could define an extension point for registering additional attributes, e.g.:

 - id: unique namespace
 - key: attribute key
 - description (optional): explaining the purpose and format of the attribute

That would have the following advantage:

- each extension has a unique identifier (id + key) that is used to access the attributes (essentially making the namespace explicit) which would enable the task list to detect conflicts when reading extensions
- the additional attributes become discoverable for other contributors
- attribute values could be stored externally from the task list for each extension id and only be loaded when a plug-in registers the particular extension id

Accessing attributes would be almost the same except that the extension id (namespace) is passed as a parameter:

 String getAttribute(String id, String key);
 void setAttribute(String id, String key, String value);
 ...
Comment 31 Eugene Kuleshov CLA 2008-01-22 22:02:05 EST
(In reply to comment #30)
> Are there other examples of Eclipse APIs that offer extensibility in a similar
> way?
> 
> The closest I can think of are cheat sheets. ICheatSheetManager allows to store
> key/value pairs of type String that are persisted through mementos. That allows
> extensions to store additional state. The major difference to the task list
> though is that these properties are bound to the life-cycle of the cheat sheet
> and discarded when the cheat sheet is restarted or completed whereas it is
> unclear to me how to manage that for the task list.

Here are couple example of platform APIs allowing attaching arbitrary content:

- IResource.create/getMarker()
- IClasspathAttribute[] IClasspathEntry.getExtraAttributes() and JavaCore.newClasspathAttribute/newContainerEntry()

Not sure what is the concern about the life cycle. Can you please elaborate?
My last patch submitted on this bug report extends DelegatingTaskExternalizer to preserve custom attributes. That allow to reuse existing backup facility and other ways to import/export tasks.

> I want to throw another idea into the discussion: Instead of allowing arbitrary
> attributes tasks core could define an extension point for registering additional
> attributes, e.g.:
> 
> - id: unique namespace
> - key: attribute key
> - description (optional): explaining the purpose and format of the attribute
> 
> That would have the following advantage:
> 
> - each extension has a unique identifier (id + key) that is used to access the
> attributes (essentially making the namespace explicit) which would enable the
> task list to detect conflicts when reading extensions
> - the additional attributes become discoverable for other contributors

If do that, I'd suggest to make it just an id (sort of fqn similar to the marker ids). Though markers are quite different scenario, because there is a common UI that can show arbitrary markers and configuration UI have to know about available types, which havent been a requirement for arbitrary task attributes, but maybe I may missed some use cases.

> - attribute values could be stored externally from the task list for each
> extension id and only be loaded when a plug-in registers the particular
> extension id

But then if given extension is not installed, its attributes won't be read and essentially they will be lost on the next write of task list. 
If I am not mistaken current task list can preserve data for tasks if corresponding connectors are not available, i.e. when migrating to a different workbench and incrementally updating new plugins. Same issue would apply to sending tasks across the wire.
Comment 32 Steffen Pingel CLA 2008-01-25 00:47:51 EST
> Not sure what is the concern about the life cycle. Can you please elaborate?

My major concern is that the task list gets bloated with unmanaged data. Consider the case of someone installing a 3rd party extension such as a planning tool which adds custom data to each task. If the plug-in is uninstalled later the task list will keep loading the data into memory and the user won't have any way of purging or accessing the data.

> My last patch submitted on this bug report extends DelegatingTaskExternalizer to
> preserve custom attributes. That allow to reuse existing backup facility and
> other ways to import/export tasks.

That is certainly an important point. The current proposal for arbitrary attributes aims towards extending the task list model and only requires few changes to integrate well with the current implementation. The solution I proposed would store the task meta-data separately from the task list but there are already two other examples in the tasks API of that: contexts and task data. As part of generalizing the import/export API (bug 209327) and defining an exchange format arbitrary attributes could be taken into account.

 (In reply to comment #31)
> Here are couple example of platform APIs allowing attaching arbitrary content:
> 
> - IResource.create/getMarker()

I am not familiar enough with the API but it seems that concept is similar to the request on this bug except that markers work with resources instead of tasks. As far as I understand the API marker attributes are managed by the creator of the marker and (optionally) persistence is managed by the workbench. Interestingly the marker extension point allows to declare attributes that exist on a marker although it is optional.

> - IClasspathAttribute[] IClasspathEntry.getExtraAttributes() and
> JavaCore.newClasspathAttribute/newContainerEntry()

Also a good example.

> > - attribute values could be stored externally from the task list for each
> > extension id and only be loaded when a plug-in registers the particular
> > extension id
> 
> But then if given extension is not installed, its attributes won't be read and
> essentially they will be lost on the next write of task list.

They could be stored in a separate file and referenced by the task handle. The files would only have to participate in handle refactorings.
Comment 33 Eugene Kuleshov CLA 2008-01-25 04:35:04 EST
(In reply to comment #32)
> > - IResource.create/getMarker()
> I am not familiar enough with the API but it seems that concept is similar to
> the request on this bug except that markers work with resources instead of
> tasks. As far as I understand the API marker attributes are managed by the
> creator of the marker and (optionally) persistence is managed by the workbench.

Not really. Markers are created or deleted upon request and automatically persisted by the platform, but I don't think platform provides any facility to delete markers.

> They could be stored in a separate file and referenced by the task handle. The
> files would only have to participate in handle refactorings.

I've been trying to minimize amount of changes required to implement this feature, as well as amount of external files that need to be kept in sync with the master task list. So less code is better. To me it doesn't seem like this feature require such complexity and risks of blowing task data is not any bigger then with the current API. Nothing stops integrators to set 4mb string values to abstract task string fields such as summary, priority or notes.

More over, because this issue isn't resolved for a long time, I had to use notes field to store custom data, but it is inefficient, because of additional parsing and also somewhat fragile from the user point of view.
Comment 34 Mik Kersten CLA 2008-01-29 02:21:39 EST
I still don't have a much better understanding of use cases than I did on comment#27, but I think that's probably fine.  We all know that we need some extensibility and I'm glad that we have converged on the fact that the extensibility won't be used for storing connector-specific data.  What I would like us to do along with the post-2.3 refactorings is extract from AbstractTask the classes suggested in comment#27.  We then add extensibility to either those classes or to AbstractTask according to the best information that we have about how clients want to use this (we're still short on info there so more would be welcome).  

Eugene: either way you will get your extensibility for 3.0.  Please vote for this bug.  For 2.3 your approach of using Notes seems OK.  The recommended approach for any others interested in this is to use a secondary data structure as per Steffen's comments.
Comment 35 Steffen Pingel CLA 2008-05-13 22:41:19 EDT
ITask now extends IAttributeContainer which exposes methods for setting and getting String key/value pairs. The attributes are persisted in the task list. These attributes are intended to be used by connectors for storing state on tasks that can not be stored using other methods in ITask. Previously connectors had to extend AbstractTask which is now discouraged.

It is possible for 3rd party tools to use IAttributeContainer to store state. Since the framework currently does not guards against memory bloat, provide means to resolve/detect naming conflicts or manages the life-cycle of attributes that is not a supported use of the API at this point.
Comment 36 Steffen Pingel CLA 2008-05-13 22:41:31 EDT
Created attachment 100103 [details]
mylyn/context/zip
Comment 37 Mik Kersten CLA 2008-05-14 19:31:35 EDT
Steffen: please add the contents of comment#35 to: http://wiki.eclipse.org/index.php/Mylyn/Integrator_Reference

We might want a 3.0-specific section of pending documentation to add.
Comment 38 Steffen Pingel CLA 2008-05-14 20:05:34 EDT
I'll hopefully have some time over the next few evenings to assemble documentation from bug reports and the mailing list. What is your sense about the best format for this? 

I could imagine a howto style approach similar to the current reference, e.g. "How do I implement a task editor?" that does not explain much detail but points at the relevant classes / extension points for documentation. 

Alternatively the documentation could be composed of multiple articles/tutorials that explain the intention of the API and have more detailed step-by-step instructions for implementing a specific feature. A good example of the latter that I have looked at frequently is this article: http://www.eclipse.org/articles/Article-Concurrency/jobs-api.html .

Maybe start with refining the integrator reference for now and worry about more detailed articles later?
Comment 39 Mik Kersten CLA 2008-05-14 23:24:07 EDT
I think that we should put all our current effort on the API naming and Javadoc.  We can do a bit of up-front work on documentation to help other projects port, but we shouls be able to do that on demand.  An article would be extremely helpful, so we can chat about that, but it's a lot of work.