| Summary: | TPTP hashtable implementation experiences table-entry loss in certain cases | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Jonathan West <jgwest> | ||||||
| Component: | TPTP | Assignee: | Jonathan West <jgwest> | ||||||
| Status: | CLOSED FIXED | QA Contact: | |||||||
| Severity: | critical | ||||||||
| Priority: | P1 | CC: | jkubasta, sluiman, stanislav.v.polevic, xubing | ||||||
| Version: | unspecified | ||||||||
| Target Milestone: | --- | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows XP | ||||||||
| Whiteboard: | closed460 | ||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 221278 | ||||||||
| Attachments: |
|
||||||||
Joanna, can you assign to me? Harm, Jonathan spoke to me about this change. Hash table is incorrect but may be too risky to include in 4.5. Any strong opinion either way? Created attachment 102551 [details] Bug 234460 patch I don't want to forestall a discussion of this bug, but, since TP2 is close, Stanislav can you review this patch? If approved, we don't want to have requested a review too late and thus missed the TP2 start.. Thanks! (In reply to comment #2) > Harm, Jonathan spoke to me about this change. Hash table is incorrect but may > be too risky to include in 4.5. Any strong opinion either way? > I honestly don't see why there is such concern about the risk. Jonathan has done a great job of investigation.The error seems rather obvious, and it clearly fails when it fails. I would fix the code, and temporarily reduce the threshold to make testing easier. The perfmon component is deprecated so it seems there is only one component affected by the change. So I would apply the fix and pick a size slightly larger than typical needs (it seems the current number is fine) An intelligent set of test will confirms the fix . And do this all ASAP.... As per Harm's comments above, I will test functionality with a small initial hashtable size (one :) ), and produce a patch for check-in that leaves the initial hashtable size at 101. Thus, testing during TP2 (and for the 4.5.0 release) will use the hashtable implementation with an initial hashtable capacity of 101. Created attachment 103054 [details] Bug 234460 - Patch 2 Stanislav, can you review this second patch? Jonathan, your patch looks good. Patch committed to Head with PMC approval. Copyright updated on check in As of TPTP 4.6.0, TPTP is in maintenance mode and focusing on improving quality by resolving relevant enhancements/defects and increasing test coverage through test creation, automation, Build Verification Tests (BVTs), and expanded run-time execution. As part of the TPTP Bugzilla housecleaning process (see http://wiki.eclipse.org/Bugzilla_Housecleaning_Processes), this enhancement/defect is verified/closed by the Project Lead since this enhancement/defect has been resolved and unverified for more than 1 year and considered to be fixed. If this enhancement/defect is still unresolved and reproducible in the latest TPTP release (http://www.eclipse.org/tptp/home/downloads/), please re-open. |
There are four hashtable.c implementations in the agent controller, each of which originated from a file of the same name that was contributed to TPTP by Scapa as part of Perfmon. Three of the four implementations are used by Perfmon soley, while the fourth is provided for use to the AC codebase by the TPTPUtil component. This HashTable implementation is used in the tptpProcessController, namedPipeTL, sharedMemTL, socketTL, and TPTPAgentCompTL/TPTPClientCompTL through BaseTL. The HashTable has no fixed size, and grows when a certain load factor is reached (hardcoded to 75% of capacity, initial capacity is 101). Unfortunately, the _TPTPUtil version of HashTable.c_ contains a bug in it's tableRehash(...) function: Whenever the hashtable rehashes (after a certain number of entries are added to it) one of its entries is unintentionally dropped from the table. That is, when the HashTable reaches its load factor, the array size of the table is doubled, all entries are rehashed and copied, and in doing so one entry fails to get copied to the new array. The _original version_ of the TPTP code, and all other implementations _presently_ in use (e.g. PerfMon) look like this: for (i = oldCapacity; i-- > 0;) { for (old = oldMap[i]; old != NULL; ) { (.... rehash code: copy oldMap[i] into newMap ...) } oldMap[i] = NULL; } The _present version_ of the TPTP hashtable.c file looks like this: for (i = oldCapacity-1; i-- > 0;) { for (old = oldMap[i]; old != NULL; ) { (.... rehash code: copy oldMap[i] into newMap ...) } oldMap[i] = NULL; } Notice that 'oldCapacity' has become 'oldCapacity-1'. The contributor misread the for statement semantics: for(a;b;c) { d } will execute a,b,(d,c) and not a,d,c,b. Thus i will start at oldCapacity-2, and not oldCapacity-1 as it should. The change was made as part of bug 134361 by Kevin P O'Leary at Intel in April 2006. Every hashtable in use has an initialCapacity of 101, and will be rehashed when 76 entries are contained in the table. Most of the AC tables will never reach this capacity. However, BaseTL (and therefore ClientCompTL) does not remove connections from its hashtable when they are completed, and thus will inevitably trigger this behaviour after 76 or so Hyades client connections. Unfortunately, this is the cause of bug 221278, and has also been observed as a contributing factor to other critical/blocker bugs as well. Because we are late into the TPTP release period, my recommendation to reduce the risk associated with this change is as follows: 1. Fix the rehash function (return oldCapacity to its original value) 2. Increase the capacity of the hashtable, in order to reduce the incidence of the rehash code being called (thereby reducing the inherent risk of late delivery of this change into the code) 3. Trace through the hashtable code, by hand, to ensure correctness. For the first recommendation, this bug blocks (at least) 1 critical bug, and thus IMHO should be delivered into 4.5. For the second recommendation, the thinking here is that by increasing the hashtable size ten fold (to 1001, or perhaps more) we nearly eliminate calls to the the rehash function for most users. This is a temporary measure until we can run the corrected rehash function through a full development test pass (post 4.5). There are only ten or so hashtable instances in the agent controller, and this increase would add approximately (4 bytes per pointer * 900 additional entries * 10 entries) 36,000 bytes, or about 35 kilobytes to the agent controller footprint (double on 64-bit systems). The third is an additional check to ensure code correctness.