Recently I made a few tests at a client's site on SQL performance with large volumes of data. I found quite interesting results that I summarise below.
2. Application Description
The client is a financial company which stores large volumes of data for a Basel application which provides a month by month situation of the contracts. The data are used for some standard reporting and also for non standard OLAP inquiries requested especially by Credit control users
The application updates 7 different physical files which are joined together in a join LF called BASUNICO1L which has a total record length of 1789 characters. The main keys of this file are the processing period (YYYYMM) and the contract number
3. Test Environment
The tests were made in a test environment where the files contained the data of about 6 months. The total number of records for the join LF were approximately 11 million and those of the period used for the test were about 2 millions.
The first set of tests were made to test the times needed to copy all records of one period from the join logical file to a physical file with the same record layout. The record were written by using three different approaches as follows:
A simple copyfile (CPYF) which includes a selection such as INCREL(*IF PERIOD *EQ 200806) A traditional (file oriented) Cobol program (TSTFIL1) which used a START to position the file pointer to the first record of the period and then entered into a loop of READ and WRITE operations to write to the output file all records of the requested period. A Cobol program with embedded SQL (TSTSQL1) which wrote the output records with a simple SQL Insert of the selected records.
A Cobol program with embedded SQL (TSTSQL3) which created the output file as an SQL MQT table. A second set of test was used instead to test the times needed to read some fields of the files by using different approaches as follows:
Traditional file oriented Cobol program (TSTFIL5) which performed a START to position the cursor and then used sequential READ operations of all records of the selected period. Cobol program with embedded SQL (TSTSQL5) which included an SQL cursor to read all lines of the selected period. I decided to perform the tests in the following different conditions:
– In an environment without additional indexes (i.e only the access paths of the files) – In an environment that could use also additional SQL indexes – In an environment that used an SQL logical view instead of the original join logical file – The results are described in the following points
4. Tests based only on the join logical file
The test made in the original environment to copy about 2 million records without SQL indexes were the following:
– The CPYF took 19 minutes and 35 seconds – The traditional file oriented Cobol program took 9 minutes and 20 seconds – The Cobol program with embedded SQL took 28 minutes
5. Tests done after the creation of SQL indexes
I checked the index advisor file QSYS2/SYSIXADV after the initial tests and I noticed that there was a record suggesting to create an index on the period field. I created both a vector index and a radix index and then started the new set of tests. The results were the following:
– The CPYF took 7 minutes and 52 seconds – The traditional file oriented Cobol program took 7 minutes and 7 seconds -The Cobol program with embedded SQL took 13 minutes and 20 secons – There has been a clear benefit especially for the Cobol program with embedded SQL. By loooking at the log, I noticed that the optimiser had chosen to use the vector index during that execution.
It is interesting that if you use the commands WRKOBJ or DSPFD or DSPDBR, the indexes appear as logical files with a special SQL type attribute which has the value INDEX.
It is also interesting that the space used for the indexes is much less than what is required for a logical file. In my test environment one of the main the PF used about 500 Mega bytes, a LF used about 257 Mega bytes, the radix index used occupied about 175 Mega bytes and the radix index only 22 Mega bytes.
6. Test of the file creation with as an MQT (Materialised query table)
I tried to create the target file by using a Cobol program with embedded SQL that produced the output table as an MQT instead of the previous INSERT and I found that the time were very good
The program took just 57 seconds to produce the results.
7. Reading Test
I compared the time spent by two programs to read from all records of a selected period to sum up a total The first program was a Cobol traditional file oriented program, whereas the second one was an SQL Cobol which included an SQL cursor to read all selected records.
The first program took 13 minutes and 17 seconds to complete, whereas the second one took on the first execution just 7 minutes and 14 seconds to complete and much less in following ones.
8, Using SQL Views instead of Logical files
I tried to use an SQL view equivalent to the logical file and repeat the tests above by using the SQL view. The results were not significantly different than those based on the logical file, however I found that a view required much less space than an equivalent LF.
The SQL views appear again to the system as special logical files with the SQL type attribute containing the value VIEW. The space occupied by the view logical files is much less than that required by a logical file. In my test environment a view equivalent to a join logical file of about 405 Mega bytes. required less than 1 Mega byte.
The tests described above seem to demonstrate what follows:
The fastest option to extract the data was the Cobol program with an embedded SQL to create a MQT table.
The traditional file oriented Cobol program was faster than the corresponding Cobol program which included an embedded SQL INSERT statement. The execution time of the Cobol SQL program was significantly affected by the creation of the SQL vector index.
The traditional file oriented Cobol program that read all records of a selected period by using START and READ NEXT operations was slower than the Cobol program which read the records with an SQL cursor.
The results agree with some points of an IBM Redbook Modernizing IBM eServer iSeries Application Data Access – A Roadmap Cornerstone ( redbooks.ibm.com/abstracts/sg246393.html?Open ). where it is written that:
The SQL insert operations are slower than Cobol write statement because SQL operations include more validations than write oprrations into a DDS PF SQL does faster reads than HLL operations. The main reason is that a cursor reading an SQL table does not have the extra data cleansing code like a DDS PF reading. Using SQL views instead of logical files should allow a significant reduction of the space occupation. The final conclusion is that a wise use of SQL can bring significant improvements in applications performance.