Community
Participate
Working Groups
i have added important java doc to the respective classes/methods: * <p/> * <strong>Note</strong> the order of elementds in the passed AnyMaps is important and reflected in the generated hash * and id. Usually this is not wanted but for performance reasons it is better to take the order as is and ensure * proper ordering in the crawler/agent. however, while we use AnyMap for attachmentds which order remains stable because it's backed by a LinkedMap this is not true for methods like this: org.eclipse.smila.connectivity.framework.util.ConnectivityHashFactory.createHash(Map<String, ?>) depending on the impl of the attachments map the order may be stable or not. This should be refactored to be SortedMap or more generally an OrderedMap to guarantee stability. PS: the problem regarding unstable ordering of the elements is such that a hash/id calc'ed with 2 different runs of the same crawler/agent might not result in the same hash/id but must for correct functionality.
I think we should just sort the keys of the given map inside the ConnectivityHashFactory.createHash() method and access the values in that order. In a crawl/agent context where IO is involved anyway this should not be a lot of overhead, especially as usually only very few attributes should contribute to the hash.
hm, not sure if the overhead is really negligible in all cases. maybe we could provide an overloaded version of the method, to also take an ordered map as i suggested while the current method would just sort them by their keys. then it becomes the responsibility of the agent/crawler writer. maybe this issue should be reviewed in light of the new crawler API. it might become obsolete or more pressing with it. any idea?
Well ... regarding optimization I learned a simple rule: Don't do it ... (before you really know it's necessary) ;-) So for the moment I would try to keep it simple for the caller. If this really proves to be a performance problem we can still do something about it.
Connectivity framework was replaced by new Importing framework. So there's no ConnectivityHashFactory any more. Not think that we have a performance problem there, so I close the issue.