Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 521548

Summary: [feed] Eclipse News Feed contains invalid <pubDate> entries
Product: Community Reporter: Andreas Sewe <sewe>
Component: WebsiteAssignee: phoenix.ui <phoenix.ui-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: chris.guindon, eclipse, roxanne.joncas, stephanie.swart
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
See Also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=521546
https://git.eclipse.org/r/104107
https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=9fc97041826c6bf2afe209a02264b239fc03ac9b
https://git.eclipse.org/r/105678
https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf
https://git.eclipse.org/r/105718
https://git.eclipse.org/r/105937
https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=1de875c2abde02ddbdc9a93095df106e7db0ad32
Whiteboard:

Description Andreas Sewe CLA 2017-08-29 11:33:32 EDT
The Eclipse News feed at [1] contains numerous malformed <pubDate>s [2]. While one can work around this by trying one format after the other, one issue is particular nasty:

  Sept 28, 2016 11:39:00 am EST
  Tue, 25 Sept 2007 11:30:00 EST

The four-letter "Sept" is not even recognized by Java's SimpleDateFormat as an alternative to "Sep" or "September".

I realize that it is a lot of work going through the huge existing feed (but see Bug 521539), but at least fixing the "Sept"s would already make writing a more resilient parser less hacky.

[1] <http://www.eclipse.org/home/eclipsenews.rss>
[2] <https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.eclipse.org%2Fhome%2Feclipsenews.rss>
[3] <https://git.eclipse.org/r/#/c/103877/1/plugins/org.eclipse.recommenders.news.impl/src/org/eclipse/recommenders/news/impl/poll/DefaultFeedItemStore.java>
Comment 1 Andreas Sewe CLA 2017-08-29 11:35:31 EDT
FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate> variations I have discovered.
Comment 2 Christopher Guindon CLA 2017-08-29 13:20:04 EDT
(In reply to Andreas Sewe from comment #1)
> FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate>
> variations I have discovered.

Andreas,
Please feel free to submit a patch to fix this and I will be happy to review it and make sure that this change does not break our homepage.

http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news

Gerrit:
https://git.eclipse.org/r/#/admin/projects/www.eclipse.org/community
Comment 3 Andreas Sewe CLA 2017-08-30 05:10:08 EDT
(In reply to Christopher Guindon from comment #2)
> (In reply to Andreas Sewe from comment #1)
> > FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate>
> > variations I have discovered.
> 
> Andreas,
> Please feel free to submit a patch to fix this and I will be happy to review
> it and make sure that this change does not break our homepage.
> 
> http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news

Thanks for the pointers, Chris, but AFAICT community.git contains only the scripts that render the RSS as HTML.

I would rather fix the problem at the source, i.e., in the RSS feed. Do you know where I can find the raw RSS (which is then consumed by feedburner)? I couldn't locate it in community.git.
Comment 4 Christopher Guindon CLA 2017-08-30 10:25:31 EDT
(In reply to Andreas Sewe from comment #3)
> (In reply to Christopher Guindon from comment #2)
> > (In reply to Andreas Sewe from comment #1)
> > > FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate>
> > > variations I have discovered.
> > 
> > Andreas,
> > Please feel free to submit a patch to fix this and I will be happy to review
> > it and make sure that this change does not break our homepage.
> > 
> > http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news
> 
> Thanks for the pointers, Chris, but AFAICT community.git contains only the
> scripts that render the RSS as HTML.
> 
> I would rather fix the problem at the source, i.e., in the RSS feed. Do you
> know where I can find the raw RSS (which is then consumed by feedburner)? I
> couldn't locate it in community.git.

+1

The RSS feeds are in the community/news folder:
http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news/2005inthenewsarchive.rss

http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news/2005newsarchive.rss
Comment 5 Eclipse Genie CLA 2017-08-31 11:57:14 EDT
New Gerrit change created: https://git.eclipse.org/r/104107
Comment 6 Andreas Sewe CLA 2017-08-31 12:02:24 EDT
(In reply to Eclipse Genie from comment #5)
> New Gerrit change created: https://git.eclipse.org/r/104107

Thankfully, GNU date(1) is extremely flexible when it comes to parsing dates. This (almost) one-liner fixes the RSS feed:

  IFS=$'\n';
  for i in $(grep '<pubDate>' 2005newsarchive.rss | sed 's/^.*<pubDate>//; s/<\/pubDate>.*$//') ; do
    sed -i 's/'$i'/'$(LOCALE=C TZ=EST date --date "$i" "+%a, %d %b %Y %H:%M:%S %Z")'/' 2005newsarchive.rss;
  done
  unset IFS

According to the W3C feed validator [1] this fixes 429 wrong dates (which should all be in RFC 822 format) and one incorrect day of the week: As it turns out, the Wednesday in "Wed, 25 Jun 2009 14:52:00 EST" is wrong; it should have been Thursday. ;-)

[1] <https://validator.w3.org/feed/>
Comment 7 Christopher Guindon CLA 2017-08-31 18:18:46 EDT
(In reply to Andreas Sewe from comment #6)
> (In reply to Eclipse Genie from comment #5)
> > New Gerrit change created: https://git.eclipse.org/r/104107
> 
> Thankfully, GNU date(1) is extremely flexible when it comes to parsing
> dates. This (almost) one-liner fixes the RSS feed:
> 
>   IFS=$'\n';
>   for i in $(grep '<pubDate>' 2005newsarchive.rss | sed 's/^.*<pubDate>//;
> s/<\/pubDate>.*$//') ; do
>     sed -i 's/'$i'/'$(LOCALE=C TZ=EST date --date "$i" "+%a, %d %b %Y
> %H:%M:%S %Z")'/' 2005newsarchive.rss;
>   done
>   unset IFS
> 
> According to the W3C feed validator [1] this fixes 429 wrong dates (which
> should all be in RFC 822 format) and one incorrect day of the week: As it
> turns out, the Wednesday in "Wed, 25 Jun 2009 14:52:00 EST" is wrong; it
> should have been Thursday. ;-)
> 
> [1] <https://validator.w3.org/feed/>

This is great! I added myself as a reviewer.

I am going to test your patch on our website tomorrow to validate that this change does not break our home page.

If needed, I will make a patch to fix issues on eclipse.org.
Comment 8 Andreas Sewe CLA 2017-09-01 03:58:24 EDT
(In reply to Christopher Guindon from comment #7)
> This is great! I added myself as a reviewer.

If you need me to split the change in two, to avoid the 1000 lines limit, please say so. I don't think this large change is problematic from a legal standpoint, however, as it was just date(1) doing all the "creative" work.

> I am going to test your patch on our website tomorrow to validate that this
> change does not break our home page.
> 
> If needed, I will make a patch to fix issues on eclipse.org.

Not sure I understand. Doesn't this patch change the feed content on eclipse.org? What additional steps are needed?
Comment 10 Christopher Guindon CLA 2017-09-01 09:41:44 EDT
(In reply to Andreas Sewe from comment #8)
> (In reply to Christopher Guindon from comment #7)
> > This is great! I added myself as a reviewer.
> 
> If you need me to split the change in two, to avoid the 1000 lines limit,
> please say so. I don't think this large change is problematic from a legal
> standpoint, however, as it was just date(1) doing all the "creative" work.

No CQ needed for this patch!

> 
> > I am going to test your patch on our website tomorrow to validate that this
> > change does not break our home page.
> > 
> > If needed, I will make a patch to fix issues on eclipse.org.
> 
> Not sure I understand. Doesn't this patch change the feed content on
> eclipse.org? What additional steps are needed?

Before merging this patch, I wanted to make sure that the date where still being parse properly by PHP. 

Everything is working as expected and your patch was merged!

Thank you very much for doing this!
Comment 11 Holger Voormann CLA 2017-09-25 03:17:37 EDT
The problem still exists for:
http://www.eclipse.org/home/eclipsenews.rss

Unfortunately, in the Eclipse IDE for Java Developers this RSS feed has been added in Oxygen.1, causing a NullPointerException to pop up after 5 minutes.
Comment 12 Andreas Sewe CLA 2017-09-25 03:28:15 EDT
(In reply to Holger Voormann from comment #11)
> The problem still exists for:
> http://www.eclipse.org/home/eclipsenews.rss
> 
> Unfortunately, in the Eclipse IDE for Java Developers this RSS feed has been
> added in Oxygen.1, causing a NullPointerException to pop up after 5 minutes.

Hi Holger. Thank you for the report.

Unfortunately, I cannot reproduce. Can you please attach the contents of the following file to this bug:

  $WORKSPACE/.metadata/.plugins/org.eclipse.recommenders.news.impl/downloads/http%3A%2F%2Fwww.eclipse.org%2Fhome%2Feclipsenews.rss

Also, can you please report which version of the org.eclipse.recommenders.news.* plugins you have installed (About Eclipse > Installation Details > Plug-ins)?
Comment 13 Holger Voormann CLA 2017-09-25 04:22:08 EDT
(In reply to Andreas Sewe from comment #12)
I need to correct myself, "Eclipse News" (http://www.eclipse.org/home/eclipsenews.rss) works in Oxygen.1, but adding this RSS feed to an Oxygen.0 (eclipse.buildId=4.7.0.I20170612-0950) package will cause an error dialog to pop up:

An internal error occurred during: "Polling news feeds".
java.lang.NullPointerException

The error log contains errors like the following:
java.text.ParseException: Unparseable date: "Thu, 07 Sept 2017 9:00:00 EST"
	at java.text.DateFormat.parse(Unknown Source)
	at org.eclipse.mylyn.internal.commons.notifications.feed.FeedEntry.getDate(FeedEntry.java:50)
	...

This issue was fixed by changing the Java code of the feed reader in Oxygen.1, not by changing the content of the RSS feed (currently, http://www.eclipse.org/home/eclipsenews.rss contains "Thu, 07 Sept 2017 9:00:00 EST"), right?
Comment 14 Eclipse Genie CLA 2017-09-25 04:24:42 EDT
New Gerrit change created: https://git.eclipse.org/r/105678
Comment 15 Andreas Sewe CLA 2017-09-25 04:36:31 EDT
(In reply to Holger Voormann from comment #13)
> This issue was fixed by changing the Java code of the feed reader in
> Oxygen.1, not by changing the content of the RSS feed (currently,
> http://www.eclipse.org/home/eclipsenews.rss contains "Thu, 07 Sept 2017
> 9:00:00 EST"), right?

Right, the fix for Bug 521546 makes the feed parser more resilient, but is obviously an ugly kludge.

However, I have tried to fix the problem upstream as well in this Bug (comment 5), but unfortunately news items published since then again use non-standard (RFC 822) date formats:

  Thu, 21 Sept 2017 8:30:00 EST

It should be "Sep" rather than "Sept".

(In reply to Eclipse Genie from comment #14)
> New Gerrit change created: https://git.eclipse.org/r/105678

This change fixes the feed again.

@Chris: Can you please point Roxanne, Stephanie, and Eric, who seem to maintain the feed's contents to the W3C Feed Validator? You can simply copy and paste a feed [1] and get quick feedback whether the feed is valid or not (currently, it complains about invalid <pubDate>s and the non-standard <pressrelease> tag).

[1] <https://validator.w3.org/feed/#validate_by_input>
Comment 16 Holger Voormann CLA 2017-09-25 05:02:42 EDT
(In reply to Andreas Sewe from comment #15)
Thanks for the quick response and fixing the RSS feed content.

The "Eclipse News" RSS feed now also works in Oxygen.0.
Comment 17 Andreas Sewe CLA 2017-09-25 06:34:18 EDT
(In reply to Holger Voormann from comment #16)
> (In reply to Andreas Sewe from comment #15)
> Thanks for the quick response and fixing the RSS feed content.

You're welcome.

> The "Eclipse News" RSS feed now also works in Oxygen.0.

Really? The change in Gerrit [1] hasn't been merged yet.

[1] <https://git.eclipse.org/r/#/c/105678/>
Comment 18 Holger Voormann CLA 2017-09-25 06:53:20 EDT
(In reply to Andreas Sewe from comment #17)
> Really? The change in Gerrit [1] hasn't been merged yet.
> 
Yes, right. I was wrong again. In Oxygen.0, I forgot that I removed it by pressing "Restore Defaults" in the Preferences dialog and confused the Planet Eclipse feed with the Eclipse News feed. Sorry.
Comment 20 Holger Voormann CLA 2017-09-25 09:47:07 EDT
(In reply to Eclipse Genie from comment #19)
One date of a today's entry still contains "Sept" instead of "Sep":
http://git.eclipse.org/c/www.eclipse.org/community.git/tree/news/2005newsarchive.rss?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf#n16
Comment 21 Eclipse Genie CLA 2017-09-25 10:19:56 EDT
New Gerrit change created: https://git.eclipse.org/r/105718
Comment 22 Andreas Sewe CLA 2017-09-25 10:21:21 EDT
(In reply to Holger Voormann from comment #20)
> (In reply to Eclipse Genie from comment #19)
> One date of a today's entry still contains "Sept" instead of "Sep":
> http://git.eclipse.org/c/www.eclipse.org/community.git/tree/news/
> 2005newsarchive.rss?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf#n16

Thanks for notifying us. That "Sept" must have slipped through while my change was still in Gerrit.

Another fix:

(In reply to Eclipse Genie from comment #21)
> New Gerrit change created: https://git.eclipse.org/r/105718
Comment 23 Holger Voormann CLA 2017-09-28 04:16:31 EDT
A new entry with the invalid date
  <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate>
has been added to
  https://www.eclipse.org/community/news/2005newsarchive.rss
Comment 24 Christopher Guindon CLA 2017-09-28 09:34:15 EDT
(In reply to Holger Voormann from comment #23)
> A new entry with the invalid date
>   <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate>
> has been added to
>   https://www.eclipse.org/community/news/2005newsarchive.rss

I am adding Stephanie to this bug to keep her in the loop.
Comment 25 Christopher Guindon CLA 2017-09-28 10:03:14 EDT
(In reply to Christopher Guindon from comment #24)
> (In reply to Holger Voormann from comment #23)
> > A new entry with the invalid date
> >   <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate>
> > has been added to
> >   https://www.eclipse.org/community/news/2005newsarchive.rss
> 
> I am adding Stephanie to this bug to keep her in the loop.

Also, adding Roxanne since she added this news item this morning.
Comment 26 Eclipse Genie CLA 2017-09-28 10:12:12 EDT
New Gerrit change created: https://git.eclipse.org/r/105937
Comment 28 Christopher Guindon CLA 2017-10-10 11:47:24 EDT
(In reply to Eclipse Genie from comment #27)
> Gerrit change https://git.eclipse.org/r/105937 was merged to [master].
> Commit:
> http://git.eclipse.org/c/www.eclipse.org/community.git/commit/
> ?id=1de875c2abde02ddbdc9a93095df106e7db0ad32

Closing this bug again!