Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 514719 (Webkit2_gettext)

Summary: [Browser][Webkit2] port Browser.getText() to webkit2
Product: [Eclipse Project] Platform Reporter: Simon Delisle <simon.delisle>
Component: SWTAssignee: Leo Ufimtsev <lufimtse>
Status: VERIFIED FIXED QA Contact: Leo Ufimtsev <lufimtse>
Severity: major    
Priority: P2 CC: akurtakov, lufimtse, malaperle, platform-swt-inbox
Version: 4.7   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
URL: https://etherpad.openstack.org/p/EclipseBug514719
See Also: https://git.eclipse.org/r/96984
https://git.eclipse.org/r/97926
https://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=2cc5d79e31d0f238b27ff78360cbac4007e4c7cd
https://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=dc4f695412958c9ff9c22dfc4b079ccf87095d00
https://git.eclipse.org/r/100648
https://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=1a34eb77413b2fbc007bc59fabb14e23b8a08b61
https://git.eclipse.org/r/100735
https://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=02bba2e33ea7631524421cd05121c5bf93950768
https://bugs.eclipse.org/bugs/show_bug.cgi?id=519177
https://bugs.eclipse.org/bugs/show_bug.cgi?id=535392
Whiteboard:
Bug Depends on:    
Bug Blocks: 441568, 516838    
Attachments:
Description Flags
Browser Snippet none

Description Simon Delisle CLA 2017-04-04 11:21:41 EDT
Created attachment 267632 [details]
Browser Snippet

Hi,
org.eclipse.swt.browser.Browser.getText() return an empty String with the following configuration: Webkit2 and GTK3. The same code with webkit1 and GTK3 return the appropriate String.

I included a code snippet to reproduce the bug. If you click on "Get HTML" button the text field should show the HTML of the current page. Using webkit2 the text box will be empty.

OS: Ubuntu 16.04
Comment 1 Leo Ufimtsev CLA 2017-04-05 17:11:36 EDT
(In reply to Simon Delisle from comment #0)
> Created attachment 267632 [details]
> Browser Snippet
> 
> Hi,
> org.eclipse.swt.browser.Browser.getText() return an empty String with the
> following configuration: Webkit2 and GTK3. The same code with webkit1 and
> GTK3 return the appropriate String.
> 
> I included a code snippet to reproduce the bug. If you click on "Get HTML"
> button the text field should show the HTML of the current page. Using
> webkit2 the text box will be empty.
> 
> OS: Ubuntu 16.04

A+ for quality of bug report. Thank you for the snippet.

I looked into it. getText() is making a call to:
webkit_web_data_source_get_encoding(..)
which is a webkit1-only function.

I will need to find Webkit2 equivalent. I'll be working on this shortly.
Comment 2 Leo Ufimtsev CLA 2017-04-05 17:12:18 EDT
Note to self:
- I need to go through all webkit functions and find webkit1 only functions called by webkit2. probably implement some mechanism for this w/ assertions.
Comment 3 Leo Ufimtsev CLA 2017-04-07 15:05:14 EDT
Note to self:

Webkit2 equivalent:

 > WebKitWebResource * webkit_web_view_get_main_resource (WebKitWebView *web_view);
 > https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit-web-view-save

Then via async call with callback:

> webkit_web_resource_get_data (...)
> webkit_web_resource_get_data_finish ()
> https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html#webkit-web-resource-get-data 


An example of this implementation is Epiphany src/window-commands.c:save_temp_source_replace_cb(..)

> resource = webkit_web_view_get_main_resource (WEBKIT_WEB_VIEW (view));
> webkit_web_resource_get_data (resource, NULL, (GAsyncReadyCallback)get_main_resource_data_cb, ostream);

I'll probably do something similar.

I'll have to run a gtk_context_iteration loop to wait for callback to finish like in evaluate.
Comment 4 Eclipse Genie CLA 2017-05-12 14:51:51 EDT
New Gerrit change created: https://git.eclipse.org/r/96984
Comment 5 Leo Ufimtsev CLA 2017-05-12 15:25:10 EDT
Related port where epiphany does the port to webkit2:
https://mail.gnome.org/archives/commits-list/2012-June/msg03725.html


(In reply to Leo Ufimtsev from comment #3)
>  > WebKitWebResource * webkit_web_view_get_main_resource (WebKitWebView *web_view);
> https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit-
> web-view-save

Wrong link. Meant to link to:
https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html#webkit-web-resource-get-data
Comment 6 Leo Ufimtsev CLA 2017-05-16 18:01:45 EDT
Note to self:
#############
Maybe consider the following api instead:

webkit_view_save(...): (as it's designed to save the MHTML content):
https://webkitgtk.org/reference/webkit2gtk/2.7.1/WebKitWebView.html#webkit-web-view-save

As implemented in:
https://bugs.webkit.org/show_bug.cgi?id=89873  (note has jUnits that I could use to see how it works).

The list mentioned Epihany's resource approach is a long-way of doing this. This api was introduced to simplify things.

Need to research this approach further.
Comment 7 Leo Ufimtsev CLA 2017-05-17 17:06:56 EDT
(In reply to Simon Delisle from comment #0)

> Hi,
> org.eclipse.swt.browser.Browser.getText() return an empty String with the
> following configuration: Webkit2 and GTK3. The same code with webkit1 and
> GTK3 return the appropriate String.

@Simon, btw, what's the use-case here?
I'm thinking if document.documentElement.outerHTML is suitable for the getText() api, or if it needs some other kind of html/text. The Browser.getText() javadoc is somewhat vague about this.
Comment 8 Leo Ufimtsev CLA 2017-05-17 17:24:08 EDT
> @Simon, btw, what's the use-case here?

Like if you give me a bunch of examples of webpages with strings that you want, then I can make sure that the solution considers those use cases.

At the moment I haven't found many instances of getText().
Comment 9 Leo Ufimtsev CLA 2017-05-22 16:11:22 EDT
Update:

Note:
- original getText() api saves HTML content in it's original form. (Webkit1/Cocoa..)
-- I.e, <!DOCTYPE ..> is preserved, <script> tags are preserved, html is before javascript was executed.

Solutions:
1) webkit_view_save() produces an MHTML file that is different from source HTML. For example <script> tags are filtered out, html has already been processed by javascript. Thus not so suitable.

2) I've experimented with using javascript "document.documentElement.outerHTML",
- produces fairly good html (script tags preserved), 
- but the html is processed by javascript. (e.g setting background adds 'style' tag to html.
- might be a backup solution, but it's not consistent with original way getText() used to work.

3) Gtk developers suggested webkit_web_resource_get_data(), as it saves raw HTML
- I've experimented with this approach. So far I get 'webkit is not a resource' error.
- I'm researching how I can get this approach to work.
Comment 10 Eclipse Genie CLA 2017-05-24 16:28:04 EDT
New Gerrit change created: https://git.eclipse.org/r/97926
Comment 11 Leo Ufimtsev CLA 2017-05-24 16:53:15 EDT
getText() implemented.

I'll investigate as to when we can merge this and will post update.
Comment 12 Leo Ufimtsev CLA 2017-05-24 17:57:02 EDT
Awaiting code unfreeze.
Comment 15 Alexander Kurtakov CLA 2017-06-28 04:35:28 EDT
In master now.
Comment 16 Leo Ufimtsev CLA 2017-06-29 15:17:21 EDT
Looks like we have some failing tests on Win32:
http://download.eclipse.org/eclipse/downloads/drops4/I20170629-0425/testresults/html/org.eclipse.swt.tests_ep48I-unit-win32_win32.win32.x86_8.0.html

test_getText   
java.lang.AssertionError: Test did not return correct string.

etc... 

I need to investigate.
Comment 17 Eclipse Genie CLA 2017-07-04 11:30:27 EDT
New Gerrit change created: https://git.eclipse.org/r/100648
Comment 19 Leo Ufimtsev CLA 2017-07-04 12:15:42 EDT
Merged patch to fix get_text() jUnits issues on win32. Need to verify with tomorrow's builds.
Comment 20 Eclipse Genie CLA 2017-07-05 10:33:50 EDT
New Gerrit change created: https://git.eclipse.org/r/100735
Comment 22 Leo Ufimtsev CLA 2017-07-05 10:39:51 EDT
All tests pass on Win32/Cocoa/Linux again.