Community
Participate
Working Groups
Build Identifier: 20110301-1815 01 Dec 2011 13:49:19] [main] [Crawler.java:793] [INFO] add one task [01 Dec 2011 13:49:19] [main] [Crawler.java:131] [INFO] encoded URL: http://a_b.com [01 Dec 2011 13:49:19] [main] [HttpExchange.java:597] [DEBUG] URI = http://a_b.com Exception in thread "main" java.lang.NullPointerException at org.eclipse.jetty.client.Address.<init>(Address.java:46) at org.eclipse.jetty.client.HttpExchange.setURI(HttpExchange.java:605) at org.eclipse.jetty.client.HttpExchange.setURL(HttpExchange.java:414) at cn.vobile.colander.Crawler.CrawlExchange.<init>(CrawlExchange.java:75) at cn.vobile.colander.Crawler.Crawler.addCrawlRequest(Crawler.java:160) at cn.vobile.colander.Crawler.Crawler.addCrawlRequest(Crawler.java:178) at cn.vobile.colander.Crawler.Crawler.main(Crawler.java:794) Reproducible: Always Steps to Reproduce: 1.create a httpclient 2. create a content exchange 3. setURL("http://a_b.com/")
Junwei, A bad host name such as "a_b" will now throw an IllegalArgumentException instead of a NPE. Fixed for 7.6.0. Jan
Hi Jan, Actually, URL with "_" works in browser. So I think we can not just throw exception for such urls, but to support them. For example, I have url: http://basic_sounds.blogspot.com/ (In reply to comment #1) > Junwei, > > A bad host name such as "a_b" will now throw an IllegalArgumentException > instead of a NPE. Fixed for 7.6.0. > > Jan
it should be supported.
Hi Junwei, Well, this is an interesting situation with the java URI and URL classes. If you do: URI uri = URI.create("http://basic_sounds.blogspot.com"); uri.getHost(); you get null. On the other hand, if you do: URI uri = URI.create("http://basic_sounds.blogspot.com"); URL url = uri.toURL(); url.getHost(); you get "basic_sounds.blogspot.com" If you read about valid hostnames, "_" is not a valid character: http://en.wikipedia.org/wiki/Hostname So, I don't think we can second-guess the URI class and try and accommodate invalid chars in hostnames. Sun (now Oracle) were pretty clear about what is and is not considered a valid name here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5049974 Now, what you can do is use a different jetty api to work around the invalid hostname. This will work: HttpExchange.setRequestURI("http://basic_sounds.blogspot.com"); HttpExchange.setAddress(new Address("basic_sounds.blogspot.com", 80); regards Jan (In reply to comment #3) > it should be supported.
OK; I see; Thank you very much.