Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 336772

Summary: BinaryResourceImpl better reporting/handling for bad data
Product: [Modeling] EMF Reporter: Miles Parker <milesparker>
Component: CoreAssignee: Ed Merks <Ed.Merks>
Status: RESOLVED WONTFIX QA Contact:
Severity: enhancement    
Priority: P3    
Version: 2.7.0   
Target Milestone: ---   
Hardware: Macintosh   
OS: Mac OS X   
Whiteboard:

Description Miles Parker CLA 2011-02-09 17:54:49 EST
As BinaryResource is much more of an "all or nothing" kind of format --.i.e. something goes wrong and the whole file is typically not recoverable -- it would be helpful to provide some support for diagnosing when things do go wrong. For example, I am getting the following error:

java.lang.NumberFormatException: For input string: "â×sü
longitudeÀU9XbNwoeId⏒㔈ࡷ潥呹灥ٓ瑡瑥ऊ灬慣敎慭攌䥮摩慮愬⁕匊๳潵牣敃潮瑥湴ࡉ湤楡湡ఋ捯"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
	at java.lang.Double.parseDouble(Double.java:510)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.readDouble(BinaryResourceImpl.java:1875)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1790)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1718)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObjects(BinaryResourceImpl.java:1439)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1728)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1718)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1718)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObjects(BinaryResourceImpl.java:1439)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1728)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObjects(BinaryResourceImpl.java:1439)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1728)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1718)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObjects(BinaryResourceImpl.java:1439)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadFeatureValue(BinaryResourceImpl.java:1728)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadEObject(BinaryResourceImpl.java:1695)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl$EObjectInputStream.loadResource(BinaryResourceImpl.java:1422)
	at org.eclipse.emf.ecore.resource.impl.BinaryResourceImpl.doLoad(BinaryResourceImpl.java:193)
	at org.eclipse.emf.ecore.resource.impl.ResourceImpl.load(ResourceImpl.java:1497)
...

First, as longitude and woeId are both features my guess is that "longitude" is the attribute that is being munched, but it would be helpful to see the feature ID in the exception.

Second, given that it looks like there is some kind of encoding error in there, I wonder if in these cases we could trap for formatting exceptions such as this and report them as warnings rather than failing the read altogether.
Comment 1 Ed Merks CLA 2011-02-10 04:21:09 EST
I'm not sure what can really be done.  It's not human readable so how is one going to fix some ugly bytes?  I can't imagine anything that's terribly useful except to use the debugger.  What really would you do with such information and why couldn't you gather it with the debugger?
Comment 2 Miles Parker CLA 2011-02-10 12:18:29 EST
(In reply to comment #1)
> I'm not sure what can really be done.  It's not human readable so how is one
> going to fix some ugly bytes?  I can't imagine anything that's terribly useful
> except to use the debugger.  What really would you do with such information and
> why couldn't you gather it with the debugger?

I'm thinking of end-user scenario here where the suggestion of "just use the debugger" would not go over so well. ;) What I want to do in a context where we do have some bad bytes is to recover the file itself while excising a bad piece of it and reporting that it had been chomped. Does that make more sense?

(To be clear, "慣敎慭攌䥮摩慮愬⁕匊๳潵牣敃潮瑥湴ࡉ湤楡湡ఋ捯" *Is* probably human readable, just not by me (without a dictionary) and perhaps not by you.)
Comment 3 Miles Parker CLA 2011-02-10 19:02:13 EST
I wonder if perhaps it would work better to not use the OPTION_STYLE_BINARY_FLOATING_POINT option? Parsing doubles is the only area where this has been an issue and I wonder if moving between the XML and Binary file encoding might be messing something up.
Comment 4 Ed Merks CLA 2011-02-11 03:38:48 EST
I don't think a corrupted byte stream can be fixed by humans.  One wrong byte and you have only noise. There shouldn't be a problem with mismatched options because the important options that affect processing are stored and should be respected.  There was a bug that was committed in this regard though and it sounds like you might be using the version that had that bug.
Comment 5 Miles Parker CLA 2011-02-11 13:29:51 EST
(In reply to comment #4)
> I don't think a corrupted byte stream can be fixed by humans.  One wrong byte
> and you have only noise. There shouldn't be a problem with mismatched options

If it is a mangled byte rather than a missing one then everything else will remain aligned. It should be possible then to simply look for the next expected tag, right? I'm not saying that we should do this now, just a thought.

> because the important options that affect processing are stored and should be
> respected.  There was a bug that was committed in this regard though and it
> sounds like you might be using the version that had that bug.

I'll check. I was building on M4 and I'll see if it happens under M5.
Comment 6 Ed Merks CLA 2011-02-11 20:39:55 EST
No, the assumption that a mangled byte leaves the rest of the stream intact isn't valid.  Many of the bytes record information about how many bytes follow, i.e., the size of a list, the length of a string.  Even int values (for object IDs, for example) are often written compressed in such a way that smaller values use fewer bytes so even here, a wrong byte will wreak havoc from which there is no recovery.
Comment 7 Miles Parker CLA 2011-03-03 16:51:28 EST
Hi Ed,

I'm good with this resolution if you want to go ahead and close it as INVALID or whatever. I see you fixed the underlying issue, thanks.

Miles
Comment 8 Ed Merks CLA 2011-03-03 16:54:22 EST
It's hard to do much that would be truly useful...