| Summary: | list of committer ids, names, and email addresses for Equinox and Eclipse | ||
|---|---|---|---|
| Product: | [Eclipse Project] Platform | Reporter: | Kim Moir <kim.moir> |
| Component: | Releng | Assignee: | Platform-Releng-Inbox <platform-releng-inbox> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | bokowski, john.arthorne, pwebster |
| Version: | 3.7 | ||
| Target Milestone: | 3.8 | ||
| Hardware: | PC | ||
| OS: | Windows XP | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 345479 | ||
|
Description
Kim Moir
I'm not sure we can provide that to you because of our privacy policy. We would have to do the import for you. Of course, there is an ldap server called ldapmaster... One other thing to consider -- are all the committers, past and present, willing to have their real email addresses out in the 'open' in the author and committer fields? CVS and SVN do not store email addresses with commits. FWIW, the Eclipse Git update hooks only need to see the committer id. I don't know the technical details of the migration script, but if it could live without email addresses it seems fine to me to omit that information. Committer real name shouldn't be considered personal information - it appears all over the place if you are a committer. Of course we can get that info from "getent passwd <userid>" > Committer real name shouldn't be considered personal information - it appears
> all over the place if you are a committer.
Right... but real names provided to the Foundation do not appear anywhere to anyone if you're not a committer. Bugzilla is special here -- individuals sign up for a Bugzilla account on their own.
Information you obtain by virtue of being a committer does not make it public information.
I was just following the process that the CDT team used in bug 316208. Their config file lists the email addresses https://bugs.eclipse.org/bugs/attachment.cgi?id=195267 From the cvs2git.options # CVS uses unix login names as author names whereas git requires # author names to be of the form "foo <bar>". The default is to set # the git author to "cvsauthor <cvsauthor>". author_transforms can be # used to map cvsauthor names (e.g., "jrandom") to a true name and # email address (e.g., "J. Random <jrandom@example.com>" for the # example shown). All strings should be either Unicode strings (i.e., # with "u" as a prefix) or 8-bit strings in the utf-8 encoding. The # values can either be strings in the form "name <email>" or tuples # (name, email). Please substitute your own project's usernames here # to use with the author_transforms option of GitOutputOption below. Also, the git migration guide in the wiki mentions email addresses http://wiki.eclipse.org/Git#Committers_new_to_Git In any case, I can try a test migration using just the username and real name. I'll just write a script to call getent passwd <userid> for all the committer ids so I have a list for the migration. So what are the requirements from Eclipse Foundation from an IP perspective? Should each commit list the id and the email address? Or is a id and name sufficient? Paul has written a script which parses the cvs history for a list of comitter ids and then queries the ldap server for the associated email addresses. Of course, the query only returns the list of email addresses for current committers. If it is required to have the committer email addresses included, would it be sufficient to include valid ones for current committers and then generate ones for inactive committers as userid@eclipse.org? (In reply to comment #6) > So what are the requirements from Eclipse Foundation from an IP perspective? > Should each commit list the id and the email address? Or is a id and name > sufficient? Our hook is happy to see either your Eclipse committer ID or your committer email address. Any one of those two is sufficient. We can get email addresses for current committers but not past ones.
Should we just assume we'll use committer IDs for everything generated? Then those that don't mind exposing their email for current commits will have configured their git config, although their Juno commits will look a little different than their Indigo ones.
find tmpcvs/eclipse.platform.ui -name "*,v" \
-exec grep "^date.*author " {} \; >author.raw
grep -v "Binary file" author.raw \
| sed 's/^date.*author //g' \
| sed 's/;.*$//g' | sort -u >author.ids
for ID in $( cat author.ids ); do
ldapsearch -x -b "ou=people,dc=eclipse,dc=org" -s one "(uid=$ID)" >tmp.txt
EMAIL=$ID
NAME=$ID
if grep "numEntries: 1\$" tmp.txt >/dev/null; then
NAME=$( grep ^cn: tmp.txt | sed 's/cn: //g' )
#EMAIL=$( grep ^mail: tmp.txt | sed 's/mail: //g' )
fi
echo "'$ID' : ('$NAME', '$EMAIL')," >>author.py
done
For me that will generate a mapping like:
'pwebster' : ('Paul Webster', 'pwebster'),
For someone unknown:
'airvine' : ('airvine', 'airvine'),
PW
(In reply to comment #8) > Should we just assume we'll use committer IDs for everything generated? Then > those that don't mind exposing their email for current commits will have > configured their git config, although their Juno commits will look a little > different than their Indigo ones. Yes I think we should do that. Because of privacy rules we should not be disclosing committers' email addresses for this purpose without their consent. If they choose to put an email address in their git config for future commits, it will be up to them. Over a period of time the email field isn't going to be consistent anyway because committers can change emails/companies over time. I think we can close this. Paul wrote some shells scripts to parse the username etc from cvs history, we won't be specifying email addresses in the initial conversion. |