Community
Participate
Working Groups
Build Identifier: M20090917-0800 Hi everyone. I'm new to egit. I use git and git cli quite a time now and never had any troubles. Today I've tried egit. In my git repro there a file-names that contains Umlauts (vowel mutation like ä or ö). If I checkout a branch of my repo with egit all files whose names contains Umlauts are marked as "untracked files". The cli don't behave like this. I'think it is very important for the acceptances of egit in Germany, that egit suports file-names with Umlauts. Reproducible: Always Steps to Reproduce: 1. git cli: Clone a repo witch contains files whose names has Umlauts. 2. egit: Import projects from this repo. 3. git cli: git status says nothing to commit and no untracked files. 4. egit: Checkout a branch. 5. git cli: git status says files whose file-names contains Umlauts are untracked files...
I agree, this should be investigated for 0.10
Created attachment 179353 [details] Screenshot witch shows the switch problem Hi, thanks for your interest. I further invastegated the problem. I'm using msysgit 1.7.1.0 and want to switch to egit. In msysgit I have set the variable core.quotepath to deal with file-names with Umlauts. If I import the project shown in the screenshot from a repo created by msysgit at first egit only shows the folder marked with (1.), and says files are untracked. If i then switch to another branch egit adds the folder marked with (2.). Funnily the second folder is named correctly. So it seams that msysgit has also troubles with Umlauts. But I think egit should not add folders by itself... Another problem is that i must use msysgit because of git-svn...
Could you try with latest egit nightly (since nightly update site currently is broken you may install it from hudson by pointing p2 at https://hudson.eclipse.org/hudson/job/egit/lastSuccessfulBuild/artifact/org.eclipse.egit-updatesite/target/site/) and try to reproduce the problem using egit only (without using msysgit). I tried and everything worked just fine (will attach a screenshot). I am on Mac hence can't try with msysgit. Note that msysgit has an open issue in that area http://code.google.com/p/msysgit/issues/detail?id=80
Hi Matthias, i had the same problem and i tried the snapshot. Now files with umlauts are marked as "added" (with 0.9.3 they were marked as "untracked"). My Git Remote Repo is saved on a Red Hat Server, and i clone this repository on a windows machine. I don't have the time today to do more testing, eventually i find the time tomorrow or on saturday. Regards, Manuel
I have to correct my last comment: With EGit 0.9.3 files with umlauts are marked as "added" after cloning the repository, same with the nightly build. This problem occurs also if i clone a repository, which was created on a computer with windows. But if i add and commit the "added"-marked files and make a new clone, the files are handled correctly by EGit.
Another amendment: after commiting a file with an umlaut, EGit shows the icon overlay for "tracked", but if i open the commit window, the file status there says "unknown". So my last comment about "handling correctly" was wrong.
Should look again at 0.11 timeframe
(In reply to comment #2) > In msysgit I have set the variable > core.quotepath to deal with file-names with Umlauts. If I import the project Minor note: core.quotepath only affect display and does not change how things work
*** Bug 332613 has been marked as a duplicate of this bug. ***
See http://egit.eclipse.org/r/#change,335 for a suggestion on how to work around this problem. The solution may alleviate the problem for some, but does not represent a solution. C Git and JGit is incompatble C Git uses the philosophy that it store the filenames "as-is", ie. the raw encoding. This is utter non-sense since on Windows the encoding is UTF-16, which obviously is not the encoding used. We can accept the explanation that is it user or machine specific. C Git commits the file names in the eight-bit locale of the user, which may be Latin-1, Latin-2 etc, cyrylic, various legacy chineese encodings, UTF-8 composed (modern Linux) or decomposed (OS X). If you create a repo using C Git in one encoding and use it (e.g. a clone) on another machine, you will have some kind of problem due to this. Operations on the index is one thing and on the file system is another. I'm quite convinced that the ONLY long-term solution is to go for one of the UTF-8 forms in tree objects, ref names etc. Hence, I made a decision a long time ago to always commit as-if JGit's encoding was UTF-8. As it turns out this is a composed form on Linux and Windows. On OS X this will be the decomposed form. The solution to this need to come from some kind of agreement to get enough mass around what is right. JGit can only get a cross-implementation halfway. C Git needs adaption too.
Discussions around patches for "msys" git is going on, whereby C Git would be able to read our UTF-8 based file names. See http://groups.google.com/group/msysgit/browse_thread/thread/d4414235850ce181/95bfcc1718fd3f1e?lnk=gst&q=blees#95bfcc1718fd3f1e Maybe we can just ignore the problem and it goes away :) Fixing C Git is the proper way of dealing with this problem. Fixing msys git will not fix C Git on unix systems with legacy encodings, but I guess they are not that many instances of these anymore.
Now it's a big problem to use egit because it commits all files with cyrillic names with UTF-8 charset and if I revert to some previous commit and look at the folder with these files then I will see 2 sets of files: 1 set - files with cp1251 filenames, 2 set - the same files but with corrupted filenames. I think that files under Windows must have native charset of filenames - cp1251 for cyrillic or another. For example, SmartGIT works good with cp1251 filenames!!! And it uses java runtime as egit and eclipse. So why egit can't do that. Eclipse platform works with these files correctly!!! Why egit corrupts filenames? Eclipse is cross platform IDE, but somehow it works correctly with native filenames on various operating systems. May be eclipse platform can help egit to select correct charset for working with filesystem. Now only egit has this problem - other git clients work well. Note this!
(In reply to comment #12) > Now it's a big problem to use egit because it commits all files with cyrillic > names with UTF-8 charset and if I revert to some previous commit and look at > the folder with these files then I will see 2 sets of files: 1 set - files with > cp1251 filenames, 2 set - the same files but with corrupted filenames. I think > that files under Windows must have native charset of filenames - cp1251 for > cyrillic or another. For example, SmartGIT works good with cp1251 filenames!!! Windows uses UTF-16 for filenames. Application may chose to work with other encodings representing a subset of UTF-16. Msys and C Git are examples of such applications. EGit can work with any Windows file name. > And it uses java runtime as egit and eclipse. So why egit can't do that. > Eclipse platform works with these files correctly!!! Why egit corrupts > filenames? Please read the other comments and follow the links. There is even set of patches for Msys Git that you may want to try. > Eclipse is cross platform IDE, but somehow it works correctly with native > filenames on various operating systems. May be eclipse platform can help egit > to select correct charset for working with filesystem. Now only egit has this > problem - other git clients work well. Note this! That actually depends on your use case. Use C Git to create a repo on Windows and try to work with it on a Mac or Linux (using C Git). This will work to some extent, but look ugly. Working with Windows/Linux works ok with only EGit. Sharing repos between Russian/Swedish/etc windows machines will (I assume) work fine with only EGit, but not with C Git. I haven't tried, but cygwin 1.7 could possibly be compatible with egit. This means cygwin git will be incompatible with msys git. C Git/Egit on OS X is incompatible with them all. It's not even fully comparible with itself due to the decomposing (...) nature of the OS X file system. Pick your poison.
Posted patches http://egit.eclipse.org/r/3573 and http://egit.eclipse.org/r/3571, for people with insights into coding.
Status update: See http://markmail.org/message/jux476xzhaz6muoi for a Unicode enabled version of Git for Windows
Now that Git has fixed this for Windows and the rest of the world is mostly UTF-8, I think we can declare this a WON'T FIX. I abandoned the patches I had in Gerrit.
(In reply to comment #16) > Now that Git has fixed this for Windows and the rest of the world > is mostly UTF-8, I think we can declare this a WON'T FIX. I abandoned > the patches I had in Gerrit. +1
*** Bug 352522 has been marked as a duplicate of this bug. ***