(In reply to Simon Delisle from comment #0) > Created attachment 267632 [details] > Browser Snippet > > Hi, > org.eclipse.swt.browser.Browser.getText() return an empty String with the > following configuration: Webkit2 and GTK3. The same code with webkit1 and > GTK3 return the appropriate String. > > I included a code snippet to reproduce the bug. If you click on "Get HTML" > button the text field should show the HTML of the current page. Using > webkit2 the text box will be empty. > > OS: Ubuntu 16.04 A+ for quality of bug report. Thank you for the snippet. I looked into it. getText() is making a call to: webkit_web_data_source_get_encoding(..) which is a webkit1-only function. I will need to find Webkit2 equivalent. I'll be working on this shortly. Note to self: - I need to go through all webkit functions and find webkit1 only functions called by webkit2. probably implement some mechanism for this w/ assertions. Note to self: Webkit2 equivalent: > WebKitWebResource * webkit_web_view_get_main_resource (WebKitWebView *web_view); > https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit-web-view-save Then via async call with callback: > webkit_web_resource_get_data (...) > webkit_web_resource_get_data_finish () > https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html#webkit-web-resource-get-data An example of this implementation is Epiphany src/window-commands.c:save_temp_source_replace_cb(..) > resource = webkit_web_view_get_main_resource (WEBKIT_WEB_VIEW (view)); > webkit_web_resource_get_data (resource, NULL, (GAsyncReadyCallback)get_main_resource_data_cb, ostream); I'll probably do something similar. I'll have to run a gtk_context_iteration loop to wait for callback to finish like in evaluate. New Gerrit change created: https://git.eclipse.org/r/96984 Related port where epiphany does the port to webkit2: https://mail.gnome.org/archives/commits-list/2012-June/msg03725.html (In reply to Leo Ufimtsev from comment #3) > > WebKitWebResource * webkit_web_view_get_main_resource (WebKitWebView *web_view); > https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit- > web-view-save Wrong link. Meant to link to: https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html#webkit-web-resource-get-data Note to self: ############# Maybe consider the following api instead: webkit_view_save(...): (as it's designed to save the MHTML content): https://webkitgtk.org/reference/webkit2gtk/2.7.1/WebKitWebView.html#webkit-web-view-save As implemented in: https://bugs.webkit.org/show_bug.cgi?id=89873 (note has jUnits that I could use to see how it works). The list mentioned Epihany's resource approach is a long-way of doing this. This api was introduced to simplify things. Need to research this approach further. (In reply to Simon Delisle from comment #0) > Hi, > org.eclipse.swt.browser.Browser.getText() return an empty String with the > following configuration: Webkit2 and GTK3. The same code with webkit1 and > GTK3 return the appropriate String. @Simon, btw, what's the use-case here? I'm thinking if document.documentElement.outerHTML is suitable for the getText() api, or if it needs some other kind of html/text. The Browser.getText() javadoc is somewhat vague about this.
> @Simon, btw, what's the use-case here?
Like if you give me a bunch of examples of webpages with strings that you want, then I can make sure that the solution considers those use cases.
At the moment I haven't found many instances of getText().
Update: Note: - original getText() api saves HTML content in it's original form. (Webkit1/Cocoa..) -- I.e, <!DOCTYPE ..> is preserved, <script> tags are preserved, html is before javascript was executed. Solutions: 1) webkit_view_save() produces an MHTML file that is different from source HTML. For example <script> tags are filtered out, html has already been processed by javascript. Thus not so suitable. 2) I've experimented with using javascript "document.documentElement.outerHTML", - produces fairly good html (script tags preserved), - but the html is processed by javascript. (e.g setting background adds 'style' tag to html. - might be a backup solution, but it's not consistent with original way getText() used to work. 3) Gtk developers suggested webkit_web_resource_get_data(), as it saves raw HTML - I've experimented with this approach. So far I get 'webkit is not a resource' error. - I'm researching how I can get this approach to work. New Gerrit change created: https://git.eclipse.org/r/97926 getText() implemented. I'll investigate as to when we can merge this and will post update. Awaiting code unfreeze. Gerrit change https://git.eclipse.org/r/96984 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=2cc5d79e31d0f238b27ff78360cbac4007e4c7cd Gerrit change https://git.eclipse.org/r/97926 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=dc4f695412958c9ff9c22dfc4b079ccf87095d00 In master now. Looks like we have some failing tests on Win32: http://download.eclipse.org/eclipse/downloads/drops4/I20170629-0425/testresults/html/org.eclipse.swt.tests_ep48I-unit-win32_win32.win32.x86_8.0.html test_getText java.lang.AssertionError: Test did not return correct string. etc... I need to investigate. New Gerrit change created: https://git.eclipse.org/r/100648 Gerrit change https://git.eclipse.org/r/100648 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=1a34eb77413b2fbc007bc59fabb14e23b8a08b61 Merged patch to fix get_text() jUnits issues on win32. Need to verify with tomorrow's builds. New Gerrit change created: https://git.eclipse.org/r/100735 Gerrit change https://git.eclipse.org/r/100735 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=02bba2e33ea7631524421cd05121c5bf93950768 All tests pass on Win32/Cocoa/Linux again. |
Created attachment 267632 [details] Browser Snippet Hi, org.eclipse.swt.browser.Browser.getText() return an empty String with the following configuration: Webkit2 and GTK3. The same code with webkit1 and GTK3 return the appropriate String. I included a code snippet to reproduce the bug. If you click on "Get HTML" button the text field should show the HTML of the current page. Using webkit2 the text box will be empty. OS: Ubuntu 16.04