| Summary: | Egit 0.9.3 has troubles with Non-ASCII/Unicode filenames | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Technology] EGit | Reporter: | Daniel Stein <daniel.stein> | ||||
| Component: | Core | Assignee: | Project Inbox <egit.core-inbox> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | caniszczyk, manuel.doninger, matthias.sohn, mikhail.ksenzov, remy.suen, robin.rosenberg, rootcn, sh2ka | ||||
| Version: | 0.9.0 | ||||||
| Target Milestone: | --- | ||||||
| Hardware: | PC | ||||||
| OS: | Windows XP | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
Daniel Stein
I agree, this should be investigated for 0.10 Created attachment 179353 [details]
Screenshot witch shows the switch problem
Hi,
thanks for your interest. I further invastegated the problem. I'm using msysgit 1.7.1.0 and want to switch to egit. In msysgit I have set the variable core.quotepath to deal with file-names with Umlauts. If I import the project shown in the screenshot from a repo created by msysgit at first egit only shows the folder marked with (1.), and says files are untracked. If i then switch to another branch egit adds the folder marked with (2.). Funnily the second folder is named correctly. So it seams that msysgit has also troubles with Umlauts. But I think egit should not add folders by itself...
Another problem is that i must use msysgit because of git-svn...
Could you try with latest egit nightly (since nightly update site currently is broken you may install it from hudson by pointing p2 at https://hudson.eclipse.org/hudson/job/egit/lastSuccessfulBuild/artifact/org.eclipse.egit-updatesite/target/site/) and try to reproduce the problem using egit only (without using msysgit). I tried and everything worked just fine (will attach a screenshot). I am on Mac hence can't try with msysgit. Note that msysgit has an open issue in that area http://code.google.com/p/msysgit/issues/detail?id=80 Hi Matthias, i had the same problem and i tried the snapshot. Now files with umlauts are marked as "added" (with 0.9.3 they were marked as "untracked"). My Git Remote Repo is saved on a Red Hat Server, and i clone this repository on a windows machine. I don't have the time today to do more testing, eventually i find the time tomorrow or on saturday. Regards, Manuel I have to correct my last comment: With EGit 0.9.3 files with umlauts are marked as "added" after cloning the repository, same with the nightly build. This problem occurs also if i clone a repository, which was created on a computer with windows. But if i add and commit the "added"-marked files and make a new clone, the files are handled correctly by EGit. Another amendment: after commiting a file with an umlaut, EGit shows the icon overlay for "tracked", but if i open the commit window, the file status there says "unknown". So my last comment about "handling correctly" was wrong. Should look again at 0.11 timeframe (In reply to comment #2) > In msysgit I have set the variable > core.quotepath to deal with file-names with Umlauts. If I import the project Minor note: core.quotepath only affect display and does not change how things work *** Bug 332613 has been marked as a duplicate of this bug. *** See http://egit.eclipse.org/r/#change,335 for a suggestion on how to work around this problem. The solution may alleviate the problem for some, but does not represent a solution. C Git and JGit is incompatble C Git uses the philosophy that it store the filenames "as-is", ie. the raw encoding. This is utter non-sense since on Windows the encoding is UTF-16, which obviously is not the encoding used. We can accept the explanation that is it user or machine specific. C Git commits the file names in the eight-bit locale of the user, which may be Latin-1, Latin-2 etc, cyrylic, various legacy chineese encodings, UTF-8 composed (modern Linux) or decomposed (OS X). If you create a repo using C Git in one encoding and use it (e.g. a clone) on another machine, you will have some kind of problem due to this. Operations on the index is one thing and on the file system is another. I'm quite convinced that the ONLY long-term solution is to go for one of the UTF-8 forms in tree objects, ref names etc. Hence, I made a decision a long time ago to always commit as-if JGit's encoding was UTF-8. As it turns out this is a composed form on Linux and Windows. On OS X this will be the decomposed form. The solution to this need to come from some kind of agreement to get enough mass around what is right. JGit can only get a cross-implementation halfway. C Git needs adaption too. Discussions around patches for "msys" git is going on, whereby C Git would be able to read our UTF-8 based file names. See http://groups.google.com/group/msysgit/browse_thread/thread/d4414235850ce181/95bfcc1718fd3f1e?lnk=gst&q=blees#95bfcc1718fd3f1e Maybe we can just ignore the problem and it goes away :) Fixing C Git is the proper way of dealing with this problem. Fixing msys git will not fix C Git on unix systems with legacy encodings, but I guess they are not that many instances of these anymore. Now it's a big problem to use egit because it commits all files with cyrillic names with UTF-8 charset and if I revert to some previous commit and look at the folder with these files then I will see 2 sets of files: 1 set - files with cp1251 filenames, 2 set - the same files but with corrupted filenames. I think that files under Windows must have native charset of filenames - cp1251 for cyrillic or another. For example, SmartGIT works good with cp1251 filenames!!! And it uses java runtime as egit and eclipse. So why egit can't do that. Eclipse platform works with these files correctly!!! Why egit corrupts filenames? Eclipse is cross platform IDE, but somehow it works correctly with native filenames on various operating systems. May be eclipse platform can help egit to select correct charset for working with filesystem. Now only egit has this problem - other git clients work well. Note this! (In reply to comment #12) > Now it's a big problem to use egit because it commits all files with cyrillic > names with UTF-8 charset and if I revert to some previous commit and look at > the folder with these files then I will see 2 sets of files: 1 set - files with > cp1251 filenames, 2 set - the same files but with corrupted filenames. I think > that files under Windows must have native charset of filenames - cp1251 for > cyrillic or another. For example, SmartGIT works good with cp1251 filenames!!! Windows uses UTF-16 for filenames. Application may chose to work with other encodings representing a subset of UTF-16. Msys and C Git are examples of such applications. EGit can work with any Windows file name. > And it uses java runtime as egit and eclipse. So why egit can't do that. > Eclipse platform works with these files correctly!!! Why egit corrupts > filenames? Please read the other comments and follow the links. There is even set of patches for Msys Git that you may want to try. > Eclipse is cross platform IDE, but somehow it works correctly with native > filenames on various operating systems. May be eclipse platform can help egit > to select correct charset for working with filesystem. Now only egit has this > problem - other git clients work well. Note this! That actually depends on your use case. Use C Git to create a repo on Windows and try to work with it on a Mac or Linux (using C Git). This will work to some extent, but look ugly. Working with Windows/Linux works ok with only EGit. Sharing repos between Russian/Swedish/etc windows machines will (I assume) work fine with only EGit, but not with C Git. I haven't tried, but cygwin 1.7 could possibly be compatible with egit. This means cygwin git will be incompatible with msys git. C Git/Egit on OS X is incompatible with them all. It's not even fully comparible with itself due to the decomposing (...) nature of the OS X file system. Pick your poison. Posted patches http://egit.eclipse.org/r/3573 and http://egit.eclipse.org/r/3571, for people with insights into coding. Status update: See http://markmail.org/message/jux476xzhaz6muoi for a Unicode enabled version of Git for Windows Now that Git has fixed this for Windows and the rest of the world is mostly UTF-8, I think we can declare this a WON'T FIX. I abandoned the patches I had in Gerrit. (In reply to comment #16) > Now that Git has fixed this for Windows and the rest of the world > is mostly UTF-8, I think we can declare this a WON'T FIX. I abandoned > the patches I had in Gerrit. +1 *** Bug 352522 has been marked as a duplicate of this bug. *** |