| Summary: | [feed] Eclipse News Feed contains invalid <pubDate> entries | ||
|---|---|---|---|
| Product: | Community | Reporter: | Andreas Sewe <sewe> |
| Component: | Website | Assignee: | phoenix.ui <phoenix.ui-inbox> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | P3 | CC: | chris.guindon, eclipse, roxanne.joncas, stephanie.swart |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| See Also: |
https://bugs.eclipse.org/bugs/show_bug.cgi?id=521546 https://git.eclipse.org/r/104107 https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=9fc97041826c6bf2afe209a02264b239fc03ac9b https://git.eclipse.org/r/105678 https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf https://git.eclipse.org/r/105718 https://git.eclipse.org/r/105937 https://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=1de875c2abde02ddbdc9a93095df106e7db0ad32 |
||
| Whiteboard: | |||
|
Description
Andreas Sewe
FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate> variations I have discovered. (In reply to Andreas Sewe from comment #1) > FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate> > variations I have discovered. Andreas, Please feel free to submit a patch to fix this and I will be happy to review it and make sure that this change does not break our homepage. http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news Gerrit: https://git.eclipse.org/r/#/admin/projects/www.eclipse.org/community (In reply to Christopher Guindon from comment #2) > (In reply to Andreas Sewe from comment #1) > > FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate> > > variations I have discovered. > > Andreas, > Please feel free to submit a patch to fix this and I will be happy to review > it and make sure that this change does not break our homepage. > > http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news Thanks for the pointers, Chris, but AFAICT community.git contains only the scripts that render the RSS as HTML. I would rather fix the problem at the source, i.e., in the RSS feed. Do you know where I can find the raw RSS (which is then consumed by feedburner)? I couldn't locate it in community.git. (In reply to Andreas Sewe from comment #3) > (In reply to Christopher Guindon from comment #2) > > (In reply to Andreas Sewe from comment #1) > > > FYI, Bug 521548 comment 0 contains a list of all malformed <pubDate> > > > variations I have discovered. > > > > Andreas, > > Please feel free to submit a patch to fix this and I will be happy to review > > it and make sure that this change does not break our homepage. > > > > http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news > > Thanks for the pointers, Chris, but AFAICT community.git contains only the > scripts that render the RSS as HTML. > > I would rather fix the problem at the source, i.e., in the RSS feed. Do you > know where I can find the raw RSS (which is then consumed by feedburner)? I > couldn't locate it in community.git. +1 The RSS feeds are in the community/news folder: http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news/2005inthenewsarchive.rss http://git.eclipse.org/c/gerrit/www.eclipse.org/community.git/tree/news/2005newsarchive.rss New Gerrit change created: https://git.eclipse.org/r/104107 (In reply to Eclipse Genie from comment #5) > New Gerrit change created: https://git.eclipse.org/r/104107 Thankfully, GNU date(1) is extremely flexible when it comes to parsing dates. This (almost) one-liner fixes the RSS feed: IFS=$'\n'; for i in $(grep '<pubDate>' 2005newsarchive.rss | sed 's/^.*<pubDate>//; s/<\/pubDate>.*$//') ; do sed -i 's/'$i'/'$(LOCALE=C TZ=EST date --date "$i" "+%a, %d %b %Y %H:%M:%S %Z")'/' 2005newsarchive.rss; done unset IFS According to the W3C feed validator [1] this fixes 429 wrong dates (which should all be in RFC 822 format) and one incorrect day of the week: As it turns out, the Wednesday in "Wed, 25 Jun 2009 14:52:00 EST" is wrong; it should have been Thursday. ;-) [1] <https://validator.w3.org/feed/> (In reply to Andreas Sewe from comment #6) > (In reply to Eclipse Genie from comment #5) > > New Gerrit change created: https://git.eclipse.org/r/104107 > > Thankfully, GNU date(1) is extremely flexible when it comes to parsing > dates. This (almost) one-liner fixes the RSS feed: > > IFS=$'\n'; > for i in $(grep '<pubDate>' 2005newsarchive.rss | sed 's/^.*<pubDate>//; > s/<\/pubDate>.*$//') ; do > sed -i 's/'$i'/'$(LOCALE=C TZ=EST date --date "$i" "+%a, %d %b %Y > %H:%M:%S %Z")'/' 2005newsarchive.rss; > done > unset IFS > > According to the W3C feed validator [1] this fixes 429 wrong dates (which > should all be in RFC 822 format) and one incorrect day of the week: As it > turns out, the Wednesday in "Wed, 25 Jun 2009 14:52:00 EST" is wrong; it > should have been Thursday. ;-) > > [1] <https://validator.w3.org/feed/> This is great! I added myself as a reviewer. I am going to test your patch on our website tomorrow to validate that this change does not break our home page. If needed, I will make a patch to fix issues on eclipse.org. (In reply to Christopher Guindon from comment #7) > This is great! I added myself as a reviewer. If you need me to split the change in two, to avoid the 1000 lines limit, please say so. I don't think this large change is problematic from a legal standpoint, however, as it was just date(1) doing all the "creative" work. > I am going to test your patch on our website tomorrow to validate that this > change does not break our home page. > > If needed, I will make a patch to fix issues on eclipse.org. Not sure I understand. Doesn't this patch change the feed content on eclipse.org? What additional steps are needed? Gerrit change https://git.eclipse.org/r/104107 was merged to [master]. Commit: http://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=9fc97041826c6bf2afe209a02264b239fc03ac9b (In reply to Andreas Sewe from comment #8) > (In reply to Christopher Guindon from comment #7) > > This is great! I added myself as a reviewer. > > If you need me to split the change in two, to avoid the 1000 lines limit, > please say so. I don't think this large change is problematic from a legal > standpoint, however, as it was just date(1) doing all the "creative" work. No CQ needed for this patch! > > > I am going to test your patch on our website tomorrow to validate that this > > change does not break our home page. > > > > If needed, I will make a patch to fix issues on eclipse.org. > > Not sure I understand. Doesn't this patch change the feed content on > eclipse.org? What additional steps are needed? Before merging this patch, I wanted to make sure that the date where still being parse properly by PHP. Everything is working as expected and your patch was merged! Thank you very much for doing this! The problem still exists for: http://www.eclipse.org/home/eclipsenews.rss Unfortunately, in the Eclipse IDE for Java Developers this RSS feed has been added in Oxygen.1, causing a NullPointerException to pop up after 5 minutes. (In reply to Holger Voormann from comment #11) > The problem still exists for: > http://www.eclipse.org/home/eclipsenews.rss > > Unfortunately, in the Eclipse IDE for Java Developers this RSS feed has been > added in Oxygen.1, causing a NullPointerException to pop up after 5 minutes. Hi Holger. Thank you for the report. Unfortunately, I cannot reproduce. Can you please attach the contents of the following file to this bug: $WORKSPACE/.metadata/.plugins/org.eclipse.recommenders.news.impl/downloads/http%3A%2F%2Fwww.eclipse.org%2Fhome%2Feclipsenews.rss Also, can you please report which version of the org.eclipse.recommenders.news.* plugins you have installed (About Eclipse > Installation Details > Plug-ins)? (In reply to Andreas Sewe from comment #12) I need to correct myself, "Eclipse News" (http://www.eclipse.org/home/eclipsenews.rss) works in Oxygen.1, but adding this RSS feed to an Oxygen.0 (eclipse.buildId=4.7.0.I20170612-0950) package will cause an error dialog to pop up: An internal error occurred during: "Polling news feeds". java.lang.NullPointerException The error log contains errors like the following: java.text.ParseException: Unparseable date: "Thu, 07 Sept 2017 9:00:00 EST" at java.text.DateFormat.parse(Unknown Source) at org.eclipse.mylyn.internal.commons.notifications.feed.FeedEntry.getDate(FeedEntry.java:50) ... This issue was fixed by changing the Java code of the feed reader in Oxygen.1, not by changing the content of the RSS feed (currently, http://www.eclipse.org/home/eclipsenews.rss contains "Thu, 07 Sept 2017 9:00:00 EST"), right? New Gerrit change created: https://git.eclipse.org/r/105678 (In reply to Holger Voormann from comment #13) > This issue was fixed by changing the Java code of the feed reader in > Oxygen.1, not by changing the content of the RSS feed (currently, > http://www.eclipse.org/home/eclipsenews.rss contains "Thu, 07 Sept 2017 > 9:00:00 EST"), right? Right, the fix for Bug 521546 makes the feed parser more resilient, but is obviously an ugly kludge. However, I have tried to fix the problem upstream as well in this Bug (comment 5), but unfortunately news items published since then again use non-standard (RFC 822) date formats: Thu, 21 Sept 2017 8:30:00 EST It should be "Sep" rather than "Sept". (In reply to Eclipse Genie from comment #14) > New Gerrit change created: https://git.eclipse.org/r/105678 This change fixes the feed again. @Chris: Can you please point Roxanne, Stephanie, and Eric, who seem to maintain the feed's contents to the W3C Feed Validator? You can simply copy and paste a feed [1] and get quick feedback whether the feed is valid or not (currently, it complains about invalid <pubDate>s and the non-standard <pressrelease> tag). [1] <https://validator.w3.org/feed/#validate_by_input> (In reply to Andreas Sewe from comment #15) Thanks for the quick response and fixing the RSS feed content. The "Eclipse News" RSS feed now also works in Oxygen.0. (In reply to Holger Voormann from comment #16) > (In reply to Andreas Sewe from comment #15) > Thanks for the quick response and fixing the RSS feed content. You're welcome. > The "Eclipse News" RSS feed now also works in Oxygen.0. Really? The change in Gerrit [1] hasn't been merged yet. [1] <https://git.eclipse.org/r/#/c/105678/> (In reply to Andreas Sewe from comment #17) > Really? The change in Gerrit [1] hasn't been merged yet. > Yes, right. I was wrong again. In Oxygen.0, I forgot that I removed it by pressing "Restore Defaults" in the Preferences dialog and confused the Planet Eclipse feed with the Eclipse News feed. Sorry. Gerrit change https://git.eclipse.org/r/105678 was merged to [master]. Commit: http://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf (In reply to Eclipse Genie from comment #19) One date of a today's entry still contains "Sept" instead of "Sep": http://git.eclipse.org/c/www.eclipse.org/community.git/tree/news/2005newsarchive.rss?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf#n16 New Gerrit change created: https://git.eclipse.org/r/105718 (In reply to Holger Voormann from comment #20) > (In reply to Eclipse Genie from comment #19) > One date of a today's entry still contains "Sept" instead of "Sep": > http://git.eclipse.org/c/www.eclipse.org/community.git/tree/news/ > 2005newsarchive.rss?id=219563ed1d6cdfb8a1d883169be68a34e84ed8bf#n16 Thanks for notifying us. That "Sept" must have slipped through while my change was still in Gerrit. Another fix: (In reply to Eclipse Genie from comment #21) > New Gerrit change created: https://git.eclipse.org/r/105718 A new entry with the invalid date <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate> has been added to https://www.eclipse.org/community/news/2005newsarchive.rss (In reply to Holger Voormann from comment #23) > A new entry with the invalid date > <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate> > has been added to > https://www.eclipse.org/community/news/2005newsarchive.rss I am adding Stephanie to this bug to keep her in the loop. (In reply to Christopher Guindon from comment #24) > (In reply to Holger Voormann from comment #23) > > A new entry with the invalid date > > <pubDate>Tue, 26 Sept 2017 14:30:00 EST</pubDate> > > has been added to > > https://www.eclipse.org/community/news/2005newsarchive.rss > > I am adding Stephanie to this bug to keep her in the loop. Also, adding Roxanne since she added this news item this morning. New Gerrit change created: https://git.eclipse.org/r/105937 Gerrit change https://git.eclipse.org/r/105937 was merged to [master]. Commit: http://git.eclipse.org/c/www.eclipse.org/community.git/commit/?id=1de875c2abde02ddbdc9a93095df106e7db0ad32 (In reply to Eclipse Genie from comment #27) > Gerrit change https://git.eclipse.org/r/105937 was merged to [master]. > Commit: > http://git.eclipse.org/c/www.eclipse.org/community.git/commit/ > ?id=1de875c2abde02ddbdc9a93095df106e7db0ad32 Closing this bug again! |