Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 147101

Summary: Report generation performance is sluggish
Product: z_Archived Reporter: Shanon McArthur <reporting.tools>
Component: BIRTAssignee: Lin Zhu <lzhu>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: P2 CC: Birt-ReportViewer-inbox, foxm, girisha, kgeis, ksunithaghm, lzhu, matt.bates, sachin, stilkari, surendramohite, vivekk, wenfeng.fwd, xiang.li, xxue
Version: 2.1.0   
Target Milestone: 2.3.0   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Report to run against 1million rows
none
slow ClassicModels crosstab report none

Description Shanon McArthur CLA 2006-06-14 14:41:24 EDT
The overall performance of report generation appears to be quite sluggish.
Here are numbers that we have measured using ORACLE as a DB.
We use ORACLE 10g w/ojdbc14.jar thin driver.

select rownum, datetime_id, another_id, text from
(
  select max(datetime_id) as datetime_id, another_id, text
  from some_oracle_schema.sample_table
  where text < 'c%'
  group by another_id, text
)

Query resulting in 8465 rows


Using PL/SQL developer:
-----------------------
Execution: approx 1 second
Iterating through all rows via PL/SQL developer UI: 10 seconds

BIRT:
-----
Preview page (via dataset editor): shows all 8465 rows in 53 seconds
Preview tab (in the designer): shows all 8465 rows in 79 seconds
Generation of rptdocument from rptdesign using API: 55 seconds
Comment 1 Wenfeng Li CLA 2006-06-14 15:34:43 EDT
please pass on to engine team after enhance data engine's data retireval performance.
Comment 2 xiaofeng zhang CLA 2006-06-14 22:42:33 EDT
Could you please attach your .rptdesign file? Thanks.
Comment 3 Wenfeng Li CLA 2006-07-05 23:30:31 EDT
please try out 2.1.0 release and let us know what the new number is.
Comment 4 Wei Wang CLA 2006-07-18 22:57:45 EDT
improve the performance of some classes
*HtmlWriter* *XMLWriter*

leave this bug open
Comment 5 Vivek Kulkarni CLA 2006-08-03 10:02:04 EDT
Hello,
I am using the BIRT Viewer v2.1 and have observed performance issues when the data sets are large (More than say 20,000 records).

=> Here is my use case where I observed performance issues:-
Database - DB2 v8.1 (This in fact does not matter)
SQL  - select <columns> from <table> (This may have millions of records in a normal use case)
Report - Tabulated report with pagination.

When I run this SQL directly against DB2 and a large result set with over 300,000 rows returned in approximately 20 seconds. I ran the DataSet from the BIRT Report Designer on my desktop against the same database and it also returned in about 20 seconds; however, the preview window only displays the first 500 rows.  When we actually run the report that uses this same DataSet, it takes over 5 minutes for the first page to render.

I am not sure but it appears that BIRT uses disk caching instead lazy-loading of pages for large result sets.

Any help/feedback on this is highly appreciated.
Comment 6 Vivek Kulkarni CLA 2006-08-03 10:16:13 EDT
Adding Birt-ReportViewer-inbox in the CC list since issue mentioned in comment#5 is with Report Viewer.
Comment 7 Gary Xue CLA 2006-08-03 14:05:33 EDT
Vivek- Birt always generates the entire report before you can see the first page rendered, therefore you can expect that the first-page rendering time will be directly proportional to the total size of data set. This performance characteristic is not likely to change in the near future. If your reports typically run against large data sets and end-user response time is critical, you should consider generating the reports in batch jobs, and only render them for the end user on demand. 

In this release we will continue to look into making incremental changes to improve the generation and rendering performance.
Comment 8 Vivek Kulkarni CLA 2006-08-04 01:55:14 EDT
Hello Gary,
Thanks for your reply. It is true that I intend to generate the reports against a large data set. In fact I have a Data warehouse where my facts and dimensions may have millions of records. 
I do not think it will be feasible in my use case to generate the reports in batch jobs and render them for the end user on demand. The resaon being, the report date is the input from the end user and it could be of any range and practically the used would at least like to see the reports from 1 to day 6 months.

I would really appreciate if you let me know whether the below alternative is feasible in case of BIRT:-
1. Find the data set classes which are responsible for getting the data from the database
2. Instead of chaching the data-set, determine the size of the result set and determine the no of pages that can be rendered.
3. Render the first page and 
4. Render the other subsequent pages on demand by iterating through the result set.

The challenge for me is a short timeframe to understand the BIRT architecture, source and then make an attempt to change it.
So, I would really appreciate if you can point me to relevant classes or provide me some high level pseudo code for reference. 

Note:- I am using the Report Viewer for rapid report development instead of BIRT APIs.

(Please excuse me if I am asking too much )
Comment 9 Sachin Dharmapurikar CLA 2006-08-08 03:12:05 EDT
I am also having similar situation as <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=147101#c8">comment 8</a>. I will prefer lazy loading pagination instead of batch execution.

Any workaround?
Comment 10 Wenfeng Li CLA 2006-08-08 16:17:38 EDT
By lazy loading, do you mean retriviing the rows from the DB only when user ask for those row?  

BIRT will not be able to calculate report paging without retriving the data rows.   And if there are calculations such as totals in the first page of the report that is not pushed to DB, BIRT can not calculate it until retrieiving all the data rows.

While BIRT can not have a general solution to lazy loading data rows, you can deisgn reports using BIRT's hyper link and sripting feature to achive the lazy loading optimization based on your application logic.  For example, when user select a 2 year time duration, your can design a report that takes a starting date as report parameter and only return one month worth of data, on the report, you will also create a hyper link to get next month report by setting the starting date parameter as current starting date + 1 month.



Comment 11 Sachin Dharmapurikar CLA 2006-08-08 23:17:44 EDT
(In reply to comment #10)

I agree with you. Thanks for the detailed explanation about the issue. Yes we can achieve the optimization using scripting and hyperlinking upto some extent. In my case I run report against huge amount of data say 1 million rows. The report which I am running against 100K rows is taking considerable long time. Is there any performance issue? Can I fix it? I have applied standard database tuning like indexing and key assignment.

Is there any plan with you guys to improve the performance?
Comment 12 Wenfeng Li CLA 2006-08-09 02:47:22 EDT
We are in deed working on improving the performance of report engine.  There is also room to improve performance by tuning the report design.  If you can attach your report design, we can check if the report can be tweaked for better performance.  

How long does it take you to generate a report from 1 mil rows and 100k rows?  And how long does it take to view report pages after the report is generated.
Comment 13 Sachin Dharmapurikar CLA 2006-08-09 07:03:51 EDT
Created attachment 47602 [details]
Report to run against 1million rows
Comment 14 Sachin Dharmapurikar CLA 2006-08-09 07:04:30 EDT
(In reply to comment #12)

Wei, I looked into details of BIRT and came to know that there are few ways to optimize the report. Here are the statistics demanded by you -

100K rows - 24.3 sec
1 million rows - 37.1 sec

The reports are attached for your reference. I was comparing the performance with my old reporting module which was written by me. It used to take around 11 seconds for 1 million rows.

Still I think its OK to compromise the performance for a little luxury :)
Comment 15 Vivek Kulkarni CLA 2006-08-09 07:10:43 EDT
In reply to comment#10:-
I do not think that calculating the no of pages will be a big deal. You may easily query the resultset to find the total record count and then decide about the pages?
Comment 16 Vivek Kulkarni CLA 2006-08-09 07:34:15 EDT
In reply to comment#14.

Sachin, I would appreciate if you tell me on the optimizations that you tried for getting these 1 million records in 37 seconds.

I have a Linux machine (RHES3) with 2 GB Memory and my report takes 10 to 15 mins to return 1 million records.

Query - select <cols> from <table>
This query takes little over 10 secs on the database.
My BIRT report is a table list with pagination
  
Comment 17 Sachin Dharmapurikar CLA 2006-08-09 08:42:16 EDT
(In reply to comment #16)

Vivek, I am having -
-AMD Athelon 2800+ machine with 1 GB RAM. 
-I ran the benchmark on Windows XP pro. 
-I used MySQL 4.3.1 engine and the report is attached above. 

The sample table contained exactly 1 million records and index placed on columns which comes in where clause or are primary key.

Although I am not a database expert but the performance depends on the database column size. The total table size of my data was 100 MB in all for 1 million records.

As you can see in the report attached there are no direct select * values in table. Instead they are aggregated or calculated.

Hope this information helps you.
Comment 18 Wenfeng Li CLA 2006-08-09 14:43:46 EDT
(In reply to comment #15)
> In reply to comment#10:-
> I do not think that calculating the no of pages will be a big deal. You may
> easily query the resultset to find the total record count and then decide about
> the pages?

Total record count is not sufficient to decide page count in many cases, for example, if pagination is based on grouping - page break for every customer, or page break for every month, BIRT engine then will need to get the count of the distinct values on the group key.  Another example will be conditional fomatting, BIRT allows report to set a boolean expression on the value in the data row, and if the expression returns true - show the row, false - hide the row.  Other examples would be master detail reports, charts, etc. where data row values will have an impact on page break.
Comment 19 Wenfeng Li CLA 2006-08-24 02:17:36 EDT
continue performance enhancement tasks in 2.2.0
Comment 20 Wei Yan CLA 2007-03-28 03:45:01 EDT
continue to optimize the performance in RC0
Comment 21 Wenfeng Li CLA 2007-05-30 20:54:53 EDT
BIRT 2.2.0 has incorporated many performance enh in report generation and viewing.  Can you help run your report again and let us know any performance difference you see?

Keep this bug open since there are still more optimizations that we would like to do.
Comment 22 Ken Geis CLA 2007-08-01 17:27:57 EDT
I cross my fingers for some work on this between now and 2.2.1.  If there is any way I can help, I will.

I haven't run very large tabular reports, but I am running a crosstab against an Oracle Data Source.  It has about 10,000 rows in the data set.  One dimension has three levels, and the other has one level.  The dimension with one level does have 35 members at that level.  I think this may be a cause of performance problems.

Anyway, as I watch the CPU when previewing or running the report, it's showing some usage for the first 20-30 seconds, then it pegs the CPU for the next five minutes or so.  When I take the query from the data set and run it in Aqua Data Studio, It takes about 7 seconds to execute in the database and another 2 seconds to display.
Comment 23 Wenfeng Li CLA 2007-08-01 18:09:41 EDT
(In reply to comment #22)
> I cross my fingers for some work on this between now and 2.2.1.  If there is
> any way I can help, I will.
> 
> I haven't run very large tabular reports, but I am running a crosstab against
> an Oracle Data Source.  It has about 10,000 rows in the data set.  One
> dimension has three levels, and the other has one level.  The dimension with
> one level does have 35 members at that level.  I think this may be a cause of
> performance problems.
> 
> Anyway, as I watch the CPU when previewing or running the report, it's showing
> some usage for the first 20-30 seconds, then it pegs the CPU for the next five
> minutes or so.  When I take the query from the data set and run it in Aqua Data
> Studio, It takes about 7 seconds to execute in the database and another 2
> seconds to display.
> 


Ken

Can you attach the report design to this bugzilla.  It will be a great help if you can recreate a similar report using the sample model car database that is bundled with BIRT.
Comment 24 Wenfeng Li CLA 2007-08-01 18:41:45 EDT
(In reply to comment #14)
> (In reply to comment #12)
> 
> Wei, I looked into details of BIRT and came to know that there are few ways to
> optimize the report. Here are the statistics demanded by you -
> 
> 100K rows - 24.3 sec
> 1 million rows - 37.1 sec
> 
> The reports are attached for your reference. I was comparing the performance
> with my old reporting module which was written by me. It used to take around 11
> seconds for 1 million rows.



Sachin, can you help run the same report again in your environment with birt 2.2.0?  We would like to see if there is any perf improvement from the previous run you had..
Comment 25 Ken Geis CLA 2007-08-01 19:11:33 EDT
Created attachment 75177 [details]
slow ClassicModels crosstab report

I put together a report on the Classic Models database that is similar to the size (#rows) and shape (data types) of the Oracle query I'm working with.  It takes about 35-40 seconds to render on a nice Xeon workstation.  This isn't the many minutes for the report I referred to earlier, but it's more than I expect, and it's a good self-contained sample.
Comment 26 Lin Zhu CLA 2007-08-06 05:00:01 EDT
Hi Ken,

Thank you for providing a sample report which help us to identify a performance problem in BIRT. Based on the that report I've made some performance enhancement so that it is expectable that the performance will be enhanced. Could you please have a try of latest BIRT 2.2.1?

Thanks.
Lin
Comment 27 Ken Geis CLA 2007-08-13 19:52:09 EDT
(In reply to comment #26)
> Hi Ken,
> 
> Thank you for providing a sample report which help us to identify a performance
> problem in BIRT. Based on the that report I've made some performance
> enhancement so that it is expectable that the performance will be enhanced.
> Could you please have a try of latest BIRT 2.2.1?

I tried 2.2.0 and 2.2.1-20070810.  The report I submitted went from 41 seconds to 6 seconds on my machine.  That's great!  Unfortunately, it apparently doesn't model my real world report very well.

I ran the crosstab report I had problems with (not the one I submitted.)  It took 45 seconds on 2.2.0 and 35 seconds on 2.2.1.
Comment 28 Wenfeng Li CLA 2007-09-13 20:26:23 EDT
schedule for 2.3 to improve performance on crosstab report item.
Comment 29 Lin Zhu CLA 2008-05-29 04:51:14 EDT
The xtab performance problem is traced by bug 185035. Meanwhile in 2.3 we've done some enhancement to xtab. Please try latest 2.3 RC2 build to see if that helps to your case.

For this bug, as the originally reported problem has been fixed, it will be mark as resolved.