| Summary: | Report generation performance is sluggish | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | z_Archived | Reporter: | Shanon McArthur <reporting.tools> | ||||||
| Component: | BIRT | Assignee: | Lin Zhu <lzhu> | ||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||
| Severity: | critical | ||||||||
| Priority: | P2 | CC: | Birt-ReportViewer-inbox, foxm, girisha, kgeis, ksunithaghm, lzhu, matt.bates, sachin, stilkari, surendramohite, vivekk, wenfeng.fwd, xiang.li, xxue | ||||||
| Version: | 2.1.0 | ||||||||
| Target Milestone: | 2.3.0 | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows XP | ||||||||
| Whiteboard: | |||||||||
| Attachments: |
|
||||||||
|
Description
Shanon McArthur
please pass on to engine team after enhance data engine's data retireval performance. Could you please attach your .rptdesign file? Thanks. please try out 2.1.0 release and let us know what the new number is. improve the performance of some classes *HtmlWriter* *XMLWriter* leave this bug open Hello, I am using the BIRT Viewer v2.1 and have observed performance issues when the data sets are large (More than say 20,000 records). => Here is my use case where I observed performance issues:- Database - DB2 v8.1 (This in fact does not matter) SQL - select <columns> from <table> (This may have millions of records in a normal use case) Report - Tabulated report with pagination. When I run this SQL directly against DB2 and a large result set with over 300,000 rows returned in approximately 20 seconds. I ran the DataSet from the BIRT Report Designer on my desktop against the same database and it also returned in about 20 seconds; however, the preview window only displays the first 500 rows. When we actually run the report that uses this same DataSet, it takes over 5 minutes for the first page to render. I am not sure but it appears that BIRT uses disk caching instead lazy-loading of pages for large result sets. Any help/feedback on this is highly appreciated. Adding Birt-ReportViewer-inbox in the CC list since issue mentioned in comment#5 is with Report Viewer. Vivek- Birt always generates the entire report before you can see the first page rendered, therefore you can expect that the first-page rendering time will be directly proportional to the total size of data set. This performance characteristic is not likely to change in the near future. If your reports typically run against large data sets and end-user response time is critical, you should consider generating the reports in batch jobs, and only render them for the end user on demand. In this release we will continue to look into making incremental changes to improve the generation and rendering performance. Hello Gary, Thanks for your reply. It is true that I intend to generate the reports against a large data set. In fact I have a Data warehouse where my facts and dimensions may have millions of records. I do not think it will be feasible in my use case to generate the reports in batch jobs and render them for the end user on demand. The resaon being, the report date is the input from the end user and it could be of any range and practically the used would at least like to see the reports from 1 to day 6 months. I would really appreciate if you let me know whether the below alternative is feasible in case of BIRT:- 1. Find the data set classes which are responsible for getting the data from the database 2. Instead of chaching the data-set, determine the size of the result set and determine the no of pages that can be rendered. 3. Render the first page and 4. Render the other subsequent pages on demand by iterating through the result set. The challenge for me is a short timeframe to understand the BIRT architecture, source and then make an attempt to change it. So, I would really appreciate if you can point me to relevant classes or provide me some high level pseudo code for reference. Note:- I am using the Report Viewer for rapid report development instead of BIRT APIs. (Please excuse me if I am asking too much ) I am also having similar situation as <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=147101#c8">comment 8</a>. I will prefer lazy loading pagination instead of batch execution. Any workaround? By lazy loading, do you mean retriviing the rows from the DB only when user ask for those row? BIRT will not be able to calculate report paging without retriving the data rows. And if there are calculations such as totals in the first page of the report that is not pushed to DB, BIRT can not calculate it until retrieiving all the data rows. While BIRT can not have a general solution to lazy loading data rows, you can deisgn reports using BIRT's hyper link and sripting feature to achive the lazy loading optimization based on your application logic. For example, when user select a 2 year time duration, your can design a report that takes a starting date as report parameter and only return one month worth of data, on the report, you will also create a hyper link to get next month report by setting the starting date parameter as current starting date + 1 month. (In reply to comment #10) I agree with you. Thanks for the detailed explanation about the issue. Yes we can achieve the optimization using scripting and hyperlinking upto some extent. In my case I run report against huge amount of data say 1 million rows. The report which I am running against 100K rows is taking considerable long time. Is there any performance issue? Can I fix it? I have applied standard database tuning like indexing and key assignment. Is there any plan with you guys to improve the performance? We are in deed working on improving the performance of report engine. There is also room to improve performance by tuning the report design. If you can attach your report design, we can check if the report can be tweaked for better performance. How long does it take you to generate a report from 1 mil rows and 100k rows? And how long does it take to view report pages after the report is generated. Created attachment 47602 [details]
Report to run against 1million rows
(In reply to comment #12) Wei, I looked into details of BIRT and came to know that there are few ways to optimize the report. Here are the statistics demanded by you - 100K rows - 24.3 sec 1 million rows - 37.1 sec The reports are attached for your reference. I was comparing the performance with my old reporting module which was written by me. It used to take around 11 seconds for 1 million rows. Still I think its OK to compromise the performance for a little luxury :) In reply to comment#10:- I do not think that calculating the no of pages will be a big deal. You may easily query the resultset to find the total record count and then decide about the pages? In reply to comment#14. Sachin, I would appreciate if you tell me on the optimizations that you tried for getting these 1 million records in 37 seconds. I have a Linux machine (RHES3) with 2 GB Memory and my report takes 10 to 15 mins to return 1 million records. Query - select <cols> from <table> This query takes little over 10 secs on the database. My BIRT report is a table list with pagination (In reply to comment #16) Vivek, I am having - -AMD Athelon 2800+ machine with 1 GB RAM. -I ran the benchmark on Windows XP pro. -I used MySQL 4.3.1 engine and the report is attached above. The sample table contained exactly 1 million records and index placed on columns which comes in where clause or are primary key. Although I am not a database expert but the performance depends on the database column size. The total table size of my data was 100 MB in all for 1 million records. As you can see in the report attached there are no direct select * values in table. Instead they are aggregated or calculated. Hope this information helps you. (In reply to comment #15) > In reply to comment#10:- > I do not think that calculating the no of pages will be a big deal. You may > easily query the resultset to find the total record count and then decide about > the pages? Total record count is not sufficient to decide page count in many cases, for example, if pagination is based on grouping - page break for every customer, or page break for every month, BIRT engine then will need to get the count of the distinct values on the group key. Another example will be conditional fomatting, BIRT allows report to set a boolean expression on the value in the data row, and if the expression returns true - show the row, false - hide the row. Other examples would be master detail reports, charts, etc. where data row values will have an impact on page break. continue performance enhancement tasks in 2.2.0 continue to optimize the performance in RC0 BIRT 2.2.0 has incorporated many performance enh in report generation and viewing. Can you help run your report again and let us know any performance difference you see? Keep this bug open since there are still more optimizations that we would like to do. I cross my fingers for some work on this between now and 2.2.1. If there is any way I can help, I will. I haven't run very large tabular reports, but I am running a crosstab against an Oracle Data Source. It has about 10,000 rows in the data set. One dimension has three levels, and the other has one level. The dimension with one level does have 35 members at that level. I think this may be a cause of performance problems. Anyway, as I watch the CPU when previewing or running the report, it's showing some usage for the first 20-30 seconds, then it pegs the CPU for the next five minutes or so. When I take the query from the data set and run it in Aqua Data Studio, It takes about 7 seconds to execute in the database and another 2 seconds to display. (In reply to comment #22) > I cross my fingers for some work on this between now and 2.2.1. If there is > any way I can help, I will. > > I haven't run very large tabular reports, but I am running a crosstab against > an Oracle Data Source. It has about 10,000 rows in the data set. One > dimension has three levels, and the other has one level. The dimension with > one level does have 35 members at that level. I think this may be a cause of > performance problems. > > Anyway, as I watch the CPU when previewing or running the report, it's showing > some usage for the first 20-30 seconds, then it pegs the CPU for the next five > minutes or so. When I take the query from the data set and run it in Aqua Data > Studio, It takes about 7 seconds to execute in the database and another 2 > seconds to display. > Ken Can you attach the report design to this bugzilla. It will be a great help if you can recreate a similar report using the sample model car database that is bundled with BIRT. (In reply to comment #14) > (In reply to comment #12) > > Wei, I looked into details of BIRT and came to know that there are few ways to > optimize the report. Here are the statistics demanded by you - > > 100K rows - 24.3 sec > 1 million rows - 37.1 sec > > The reports are attached for your reference. I was comparing the performance > with my old reporting module which was written by me. It used to take around 11 > seconds for 1 million rows. Sachin, can you help run the same report again in your environment with birt 2.2.0? We would like to see if there is any perf improvement from the previous run you had.. Created attachment 75177 [details]
slow ClassicModels crosstab report
I put together a report on the Classic Models database that is similar to the size (#rows) and shape (data types) of the Oracle query I'm working with. It takes about 35-40 seconds to render on a nice Xeon workstation. This isn't the many minutes for the report I referred to earlier, but it's more than I expect, and it's a good self-contained sample.
Hi Ken, Thank you for providing a sample report which help us to identify a performance problem in BIRT. Based on the that report I've made some performance enhancement so that it is expectable that the performance will be enhanced. Could you please have a try of latest BIRT 2.2.1? Thanks. Lin (In reply to comment #26) > Hi Ken, > > Thank you for providing a sample report which help us to identify a performance > problem in BIRT. Based on the that report I've made some performance > enhancement so that it is expectable that the performance will be enhanced. > Could you please have a try of latest BIRT 2.2.1? I tried 2.2.0 and 2.2.1-20070810. The report I submitted went from 41 seconds to 6 seconds on my machine. That's great! Unfortunately, it apparently doesn't model my real world report very well. I ran the crosstab report I had problems with (not the one I submitted.) It took 45 seconds on 2.2.0 and 35 seconds on 2.2.1. schedule for 2.3 to improve performance on crosstab report item. The xtab performance problem is traced by bug 185035. Meanwhile in 2.3 we've done some enhancement to xtab. Please try latest 2.3 RC2 build to see if that helps to your case. For this bug, as the originally reported problem has been fixed, it will be mark as resolved. |