Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 104099

Summary: Search should search in attachments
Product: Community Reporter: Bjorn Freeman-Benson <bjorn.freeman-benson>
Component: BugzillaAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED WONTFIX QA Contact:
Severity: enhancement    
Priority: P5 CC: denis.roy, gunnar, kai-uwe_maetzel, recoskie
Version: unspecifiedKeywords: helpwanted
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:

Description Bjorn Freeman-Benson CLA 2005-07-15 15:59:34 EDT
Searching is a problem since stack traces are often in attachments which are not
searched. Stack traces are good means to find duplicates. However stack traces
are often attached. So search in attachments must be supported as well.
Comment 1 Eclipse Webmaster CLA 2005-10-28 16:25:01 EDT
Unless this is a feature of Bugzilla I'm not aware of, this is beyond the scope
of what I can do.

Opening a defect against Bugzilla would be advisable.

D.
Comment 2 Kai-Uwe Maetzel CLA 2005-10-31 03:28:48 EST
This is an issue that causes projects with a high number of problem reports a
considerable amount of manual work. This request focuses on lowering the burden
of administrative work for the projects and make them more productive in their
direct project work. Please consult Mike on this subject before resolving. 

This issues is closely releated to bug 104100.
Comment 3 Eclipse Webmaster CLA 2006-02-13 15:55:21 EST
Please comment/vote for this bug at Mozilla.org:

https://bugzilla.mozilla.org/show_bug.cgi?id=142188

D.
Comment 4 Eclipse Webmaster CLA 2006-03-17 15:17:54 EST
This is absolutely not a good idea.  I regularly see Bugzilla searches that take 60+ seconds, some requiring temp tables to disk, causing table locks for other searches.  Below is an example taken ... 5 minutes ago.  The second column is the query runtime, in seconds.

Query   | 75       | Copying to tmp table                   | SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_products |
Query   | 50       | Locked                                 | SELECT DISTINCT bugs.bug_id, MIN(group_control_map.membercontrol) FROM bugs INNER JOIN bug_group_map |
Query   | 48       | Locked                                 | SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_products |
Query   | 47       | Locked                                 | SELECT bug_id FROM bugs WHERE bug_id = 74003                                                         |
Query   | 45       | Locked                                 | SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_products |
Query   | 45       | Locked                                 | SELECT bugs.

As long as MySQL is unable to perform row locking, the more data in a select, the greater the odds of it locking tables for the rest. Performing fulltext search will only compound the problem, and bugzilla is slow enough as it is.

As an alternative, projects can use PHP to programatically access bugzilla databases.

D.
Comment 5 Kai-Uwe Maetzel CLA 2006-03-17 16:46:33 EST
If the described approach does not work and the database can be programmatically accessed using PHP than I propose to use PHP to provide the search functionality we  request. It should not be that each project has to invent their own way to search the content of attachment. 
Comment 6 Eclipse Webmaster CLA 2006-03-18 13:51:39 EST
Kae, I feel you are abusing your Bugzilla privileges by constantly reopening this bug. Please read comment #6 very carefully - it doesn't matter what language you use:

full text search will kill bugzilla performance using Bugzilla
full text search will kill bugzilla performance using Perl
full text search will kill bugzilla performance using PHP
full text search will kill bugzilla performance using C
full text search will kill bugzilla performance using Assembler

There is a reason Mozilla isn't implementing this - since 2002!!  When this feature can be used on bugzilla.mozilla.org without sacrificing their performance, then we'll implement it here.

D.
Comment 7 Gunnar Wagenknecht CLA 2006-03-18 14:08:47 EST
The only way I can think of to implement this in a performance friendly way is by using an external search engine that is capable of indexing database entries. It will go through the database an build an index. A search would not be performed in the database but in an index which has links to Bugzilla ids. 

However, I'm afraid that there is no existing solution. It has to be implemented. Denis, can the search engine running on Eclipse.org be extended to support this? What technology is it based on?
Comment 8 Eclipse Webmaster CLA 2006-03-27 10:53:01 EST
Reopening, as there seems to be too much insistance in getting this done.

(In reply to comment #7)
> The only way I can think of to implement this in a performance friendly way is
> by using an external search engine that is capable of indexing database
> entries.

I don't think that would be much better. In any case, one could use google to see if using the search engine would satisfy this requirement - Google should have everything at this point. A couple of months ago I contacted Google and asked them to be less aggressive with their bugzilla index - they were putting servers in the red.  Actually, the same goes for the numerous folks that run recursive wget's in a loop, to fetch all the bugs recursively.

It seems that using our search engine would result in a duplicate store of the bugzilla data, while causing bad bugzilla performance while indexing.

One solution would be to use a second slave database server only for bugzilla attachment searches. Then again, we still don't have any code that will actually do the search, but this would remove the hardware bottleneck.

D.
Comment 9 Gunnar Wagenknecht CLA 2006-03-27 11:03:18 EST
(In reply to comment #8)
> It seems that using our search engine would result in a duplicate store of the
> bugzilla data, while causing bad bugzilla performance while indexing.

I think that's the nature of indexing. You create an indexed list of references to items. And indexing has a lot advantages. Bugzilla entries AND attachments would be searchable from within the main Eclipse.org search feature.

The Eclipse search engine could run in low traffic hours. But I'm wondering if there is a general way to handle aggressive robots? Maybe something like catching aggressive search robots at Apache level and sending them to a proxy that caches content for 24 hours or longer?

Comment 10 Eclipse Webmaster CLA 2006-03-27 11:15:18 EST
(In reply to comment #9)
> But I'm wondering if
> there is a general way to handle aggressive robots? Maybe something like
> catching aggressive search robots at Apache level and sending them to a proxy
> that caches content for 24 hours or longer?

Don't forget we run a cluster of 3-5 servers, so unlike in a single-server environment, Apache on any given host is not aware of its surroundings.

I agree that a squid cache would be our next step, but if we go this route, we'd enable it site-wide.


> The Eclipse search engine could run in low traffic hours.

I'll tell you what - I've enabled indexing for Bugzilla content.  It's set to run tonight, at 8:00PM Eastern.  Let's see what happens.

D.
Comment 11 Eclipse Webmaster CLA 2006-05-23 17:09:23 EDT
I've discontinued indexing the bugzilla content for the following reasons:

- Bugzilla doesn't have a low-traffic period. Because tables can be locked for any amount of time for any query, some folks get frustrated because of the induced lag. Committers have been quite vocal in expressing that Bugzilla functionality and performance are critical; however, attachment search doesn't appear to be.

- Seems redundant to index content that already has a nice search front-end to it; just missing a feature

- Indexing bugzilla never really worked; the search engine indexer kept returning errors. I believe it has to do with the https port mapping we're doing to enable our wildcard cert to work on the same IP address.


I'll close this as LATER to keep track of the Mozilla bug.  We [the Foundation] currently don't have any concrete plans to do anything with this. I suggest you vote for the actual Mozilla enhancement (link below), or convince The Powers Above Me that this needs to be a priority. Right now, I'm way too scared to do anything that will tamper with Bugzilla.

https://bugzilla.mozilla.org/show_bug.cgi?id=142188

D.
Comment 12 Eclipse Webmaster CLA 2006-06-21 13:41:54 EDT
*** Bug 147474 has been marked as a duplicate of this bug. ***
Comment 13 Denis Roy CLA 2009-08-18 11:23:48 EDT
Reopening LATER/REMIND bugs.
Comment 14 Denis Roy CLA 2018-11-05 16:01:47 EST
I don't think this is a good idea.