Community
Participate
Working Groups
File org.eclipse.egit.core/src/org/eclipse/egit/core/internal/storage/CommitFileRevision.java Line 61: new RevWalk(db).parseBody(rc); You shouldn't do this. I can't promise you that parsing the body of another RevWalk's RevCommit with a new RevWalk will continue to work in the future. If you need to ensure the buffer is parsed, you should be passing around the RevWalk that created the RevCommit you passed around, and call parseBody() on that. parseBody() knows whether or not the buffer is loaded. Its essentially a no-op if the buffer is already loaded. Therefore you don't need the rc.getRawBuffer() conditional check around it. I know we don't promise that getRawBuffer() implies the other things are accessible. That is, just because getRawBuffer() returns non-null does not imply that getFullMessage() or getAuthorIdent() will work. Only parseBody() promises that.
Dariusz, can you look at Shawn's concerns?
Shawn, is there any way to 'disconnect' RevCommit object from RevWalk? As far as I understood how RevWalk works, it disposes 'buffer' array when it isn't usable any more for walking. I'm wondering does we, on application level, cache complete RevObject instance. If not ... the only solution for this problem would be passing RevWalk and ObjectId over all 'abstraction levels' in EGit.
(In reply to comment #2) > Shawn, is there any way to 'disconnect' RevCommit object from RevWalk? Nope. Well, you can get one by parsing the commit and then never using that in a traversal. I.e.: RevWalk rw = new RevWalk(db); RevCommit c = rw.parseCommit(objectId); ... now never use rw ... Another option is to use rw.setRetainBody(true) so the walker won't discard the body its parsed. But if you do big history operations involving hundreds of commits, and most of them you don't really need the body of, this can be a big waste of memory. > As far as I understood how RevWalk works, it disposes 'buffer' array when it > isn't usable any more for walking. I'm wondering does we, on application level, > cache complete RevObject instance. If not ... the only solution for this > problem would be passing RevWalk and ObjectId over all 'abstraction levels' in > EGit. Yes. It is a bit common to pass around both. Or, do a rw.parseBody(c) before you pass the RevCommit and then just ensure you never perform another traversal on that RevWalk.
(In reply to comment #3) > Yes. It is a bit common to pass around both. Or, do a rw.parseBody(c) before > you pass the RevCommit and then just ensure you never perform another traversal > on that RevWalk. This will be quite tricky since in GitModelRepository we use LogCommand to obtain list of commit's. It is even more complicated because we use two log's: * common ancestor - source commit * common ancestor - destination commit this solution was suggested by Stefan Lay (AFAIR) to obtain complete list of commits between source and destination. LogCommad internally use RevWalk, so actually we have two separate instances of RevWalk ... maybe we could merge them into one? And then pass it to RevModelCommit constructor.
(In reply to comment #4) > > This will be quite tricky since in GitModelRepository we use LogCommand to > obtain list of commit's. It is even more complicated because we use two log's: > * common ancestor - source commit > * common ancestor - destination commit > this solution was suggested by Stefan Lay (AFAIR) to obtain complete list of > commits between source and destination. LogCommad internally use RevWalk, so > actually we have two separate instances of RevWalk ... maybe we could merge > them into one? And then pass it to RevModelCommit constructor. Yes. If you use the left/right approach I described you can use just one RevWalk instance to get both sets of commits and the common ancestor in a single traversal. This will be faster, as its only one walk and not 3. Since you are interested in all of the commits returned, their bodies would be saved by the RevWalk, and you'd be OK to use them without needing parseBody(), so long as you don't start a new traversal on that RevWalk. Which you probably don't have to do, since you got everything in 1 traversal. :-)
(In reply to comment #5) > Since you are interested in all of the commits returned, their bodies would be > saved by the RevWalk, and you'd be OK to use them without needing parseBody(), > so long as you don't start a new traversal on that RevWalk. Which you probably > don't have to do, since you got everything in 1 traversal. OK, but what about memory usage if we would have more then 1k commits in traversal? We already had problem with NPE during parsing RevCommit on oldest commit in list, I'm afraid that we can once again hit this issue.
(In reply to comment #6) > (In reply to comment #5) > > Since you are interested in all of the commits returned, their bodies would be > > saved by the RevWalk, and you'd be OK to use them without needing parseBody(), > > so long as you don't start a new traversal on that RevWalk. Which you probably > > don't have to do, since you got everything in 1 traversal. > > OK, but what about memory usage if we would have more then 1k commits in > traversal? But that implies there are 1k commits different between the two sides. So no matter what you do, you'd want 1k commits loaded so you can present them to the user, no? If you abort the RevWalk when you find the common ancestor (the commit with both local and remote flags on it), the walker won't go too far back in history. So even a project with 888k commits will only need to parse, load, and cache the commits that are different between the two sides, which is probably under 1k. The remainder of the old history of the project won't even be looked at. > We already had problem with NPE during parsing RevCommit on oldest commit in > list, I'm afraid that we can once again hit this issue. I'm not sure I understand. Do you know what the root cause of that was? JGit throws NPE for a lot of reasons, but the most common one during parseCommit() or parseBody() was the parameter was null. :-)
(In reply to comment #7) > I'm not sure I understand. Do you know what the root cause of that was? JGit > throws NPE for a lot of reasons, but the most common one during parseCommit() > or parseBody() was the parameter was null. :-) Yes I know the cause of NPE, the 'buffer' was null, because it was disposed by RevWalk. But I don't know why it was disposed.
(In reply to comment #8) > > Yes I know the cause of NPE, the 'buffer' was null, because it was disposed by > RevWalk. But I don't know why it was disposed. We dispose uninteresting commits, unless you ask for boundary sorting. Was that "last" commit not actually returned by the RevWalk, but was instead someone else's parent? In such a case we don't promise that the buffer is parsed. Did you markUninteresting anything on that traversal? Without topo sorting its possible for a commit to be output, but then later be determined to be uninteresting, and we might have discarded the buffer then, after you had already saved it. Its the price you pay for not enforcing topo sorting on your result, you may get slightly inaccurate results, but its far faster, and usually correct.
(In reply to comment #9) > Was that "last" commit not actually returned by the RevWalk, but was instead > someone else's parent? All commits that are displayed in git change set comes from LogCommand, with use RevWalk internally. Since there were used two separate LogCommand's it is possible that this "last commit" was created by different RevWalk ... and this seams to be the root cause of this NPE. If we use only one LogCommand with single RevWalk this problem will occur, am I right?
After merging commit 8a77b0e we can remove "new RevWalk(db).parseBody(rc)" Change set with does that is here: http://egit.eclipse.org/r/#change,1437
Fixed with commit: 75a371e156977f26f72a620880584e6332db0f42