This Bugzilla instance is deprecated, and most Eclipse projects now use GitHub or Eclipse GitLab. Please see the deprecation plan for details.
Bug 349350 - list of committer ids, names, and email addresses for Equinox and Eclipse
Summary: list of committer ids, names, and email addresses for Equinox and Eclipse
Status: RESOLVED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 3.7   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.8   Edit
Assignee: Platform-Releng-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 345479
  Show dependency tree
 
Reported: 2011-06-14 14:06 EDT by Kim Moir CLA
Modified: 2011-06-28 15:33 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kim Moir CLA 2011-06-14 14:06:43 EDT
Not sure if this is the right bucket, but it's related to our git migration so I'll start here :-)

We need a list of all our past and present committers (id, first and last name, and email address) to include in our cvs2git.options file during our migration process.

You can see from the line 3 below that we have had 240 committers in Eclipse.  
I also need the information for rt.equinox committers.

http://dash.eclipse.org/dash/commits/web-api/commit-committers-by-project.php
Comment 1 Denis Roy CLA 2011-06-14 14:33:56 EDT
I'm not sure we can provide that to you because of our privacy policy.  We would have to do the import for you.

Of course, there is an ldap server called ldapmaster...
Comment 2 Denis Roy CLA 2011-06-14 14:35:46 EDT
One other thing to consider -- are all the committers, past and present, willing to have their real email addresses out in the 'open' in the author and committer fields?  CVS and SVN do not store email addresses with commits.

FWIW, the Eclipse Git update hooks only need to see the committer id.
Comment 3 John Arthorne CLA 2011-06-14 14:46:23 EDT
I don't know the technical details of the migration script, but if it could live without email addresses it seems fine to me to omit that information. Committer real name shouldn't be considered personal information - it appears all over the place if you are a committer. Of course we can get that info from "getent passwd <userid>"
Comment 4 Denis Roy CLA 2011-06-14 14:50:02 EDT
> Committer real name shouldn't be considered personal information - it appears
> all over the place if you are a committer. 

Right... but real names provided to the Foundation do not appear anywhere to anyone if you're not a committer.  Bugzilla is special here -- individuals sign up for a Bugzilla account on their own.

Information you obtain by virtue of being a committer does not make it public information.
Comment 5 Kim Moir CLA 2011-06-14 15:44:04 EDT
I was just following the process that the CDT team used in bug 316208.  Their config file lists the email addresses

https://bugs.eclipse.org/bugs/attachment.cgi?id=195267

From the cvs2git.options
# CVS uses unix login names as author names whereas git requires
# author names to be of the form "foo <bar>".  The default is to set
# the git author to "cvsauthor <cvsauthor>".  author_transforms can be
# used to map cvsauthor names (e.g., "jrandom") to a true name and
# email address (e.g., "J. Random <jrandom@example.com>" for the
# example shown).  All strings should be either Unicode strings (i.e.,
# with "u" as a prefix) or 8-bit strings in the utf-8 encoding.  The
# values can either be strings in the form "name <email>" or tuples
# (name, email).  Please substitute your own project's usernames here
# to use with the author_transforms option of GitOutputOption below.

Also, the git migration guide in the wiki mentions email addresses
http://wiki.eclipse.org/Git#Committers_new_to_Git

In any case, I can try a test migration using just the username and real name.  I'll just write a script to call getent passwd <userid> for all the committer ids so I have a list for the migration.
Comment 6 Kim Moir CLA 2011-06-21 22:10:51 EDT
So what are the requirements from Eclipse Foundation from an IP perspective?  Should each commit list the id and the email address?  Or is a id and name sufficient?

Paul has written a script which parses the cvs history for a list of comitter ids and then queries the ldap server for the associated email addresses.  Of course, the query only returns the list of email addresses for current committers.  If it is required to have the committer email addresses included, would it be sufficient to include valid ones for current committers and then generate ones for inactive committers as userid@eclipse.org?
Comment 7 Denis Roy CLA 2011-06-22 10:35:24 EDT
(In reply to comment #6)
> So what are the requirements from Eclipse Foundation from an IP perspective? 
> Should each commit list the id and the email address?  Or is a id and name
> sufficient?

Our hook is happy to see either your Eclipse committer ID or your committer email address.  Any one of those two is sufficient.
Comment 8 Paul Webster CLA 2011-06-23 07:47:51 EDT
We can get email addresses for current committers but not past ones.

Should we just assume we'll use committer IDs for everything generated?  Then those that don't mind exposing their email for current commits will have configured their git config, although their Juno commits will look a little different than their Indigo ones.

  find tmpcvs/eclipse.platform.ui -name "*,v" \
    -exec grep "^date.*author " {} \; >author.raw
  grep -v "Binary file" author.raw \
    | sed 's/^date.*author //g' \
    | sed 's/;.*$//g' | sort -u >author.ids
  for ID in $( cat author.ids ); do
    ldapsearch -x -b "ou=people,dc=eclipse,dc=org" -s one "(uid=$ID)" >tmp.txt
    EMAIL=$ID
    NAME=$ID
    if grep "numEntries: 1\$" tmp.txt >/dev/null; then
      NAME=$( grep ^cn: tmp.txt | sed 's/cn: //g' )
      #EMAIL=$( grep ^mail: tmp.txt | sed 's/mail: //g' )
    fi
    echo "'$ID' : ('$NAME', '$EMAIL')," >>author.py
  done

For me that will generate a mapping like:
'pwebster' : ('Paul Webster', 'pwebster'),

For someone unknown:
'airvine' : ('airvine', 'airvine'),

PW
Comment 9 John Arthorne CLA 2011-06-23 08:45:15 EDT
(In reply to comment #8)
> Should we just assume we'll use committer IDs for everything generated?  Then
> those that don't mind exposing their email for current commits will have
> configured their git config, although their Juno commits will look a little
> different than their Indigo ones.

Yes I think we should do that. Because of privacy rules we should not be disclosing committers' email addresses for this purpose without their consent. If they choose to put an email address in their git config for future commits, it will be up to them. Over a period of time the email field isn't going to be consistent anyway because committers can change emails/companies over time.
Comment 10 Kim Moir CLA 2011-06-28 15:33:01 EDT
I think we can close this.  Paul wrote some shells scripts to parse the username etc from cvs history, we won't be specifying email addresses in the initial conversion.