Index and Sort facilities Tutorial
Basic Overview of INDEX
The index facility allows you to create multiple indexes, or ways to access information in a data file. The index will re-order access to a file based upon a key (which can be any part or parts of the file, for example, by name or date or both) which is associated with the relative access number. It doesn't re-order the file itself, but only the order of accessing the files.
How to set an Index file up
The index command creates or re-creates an index file. It can be used in READY mode, or from a procedure file. This is useful for re-indexing a file after changes have been made through another program.
In the program we've been working on so far, you could create an index file based on a particular part of the file. Perhaps name, city, or Item 1 purchase.
The index command's syntax is as follows:
INDEX data-filename index-filename 1st key start pos 1st key length action
The available parameters for action are: LISTDUPKEYS, DUPKEYS, NATIVE, REORG, and REPLACE.
The following line would create an INDEX.FIL of the file called MASTER.FIL:
INDEX MASTER.FIL INDEX.FIL 10 5 REPLACE
This will create (or recreate if it already exists) an index file for the Master File, since REPLACE is specified. The key used is the section from position 10 to 15 of the file.
To create an index file, with a key of city, for the ORDERS.int file we are working on would be:
INDEX ORDERS.int cityindex.int 91 15 REPLACE
What would the index file be like if we based it on last name? Do this now.
Now what if there were two customers with the same last name? Using the DUPKEYS parameter will send you a message if there are duplicates:
INDEX ORDERS.int lastindex.int 31 30 REPLACE DUPKEYS
and the LISTDUPKEYS parameter will display a list of such duplicates:
INDEX ORDERS.int lastindex.int 31 30 REPLACE LISTDUPKEYS
To handle this, you can create an index file using two or more keys, perhaps with first and last name like so:
INDEX ORDERS.int firstandlast.int 31/1 30/30 REPLACE
the slash (/) separates the first and second second keys' start point and size (number of positions in the file).
Next, just to practice, create an index file based on state and city.
As you can imagine, it's important for files to have unique keys. Using time and date within a record is one way of differentiating each entry.
After you have created an index file, we can assume that new records are going to be added to the original file, and so of course we'd like to update the index at the same time. In order to do this, return to the original program and add the following syntax to the OPEN statement:
… kfname=indexname,kps=key-start-position,kln=key-length... keyed
where index name is the name of your index file, kps indicates key start position, and kln is the key length.
in our case it would be:
so that the complete OPEN statement in the main program is:
Open #1: "name=orders.int, recl=118,USE,kfname=lastfirst.int,kps=31/1,kln=30/30", internal, OUTIN, keyed
Since we created more than one index file, an OPEN statement must be added to each to open the main file with the index and add new records to each index file. The second, for example, will be:
Open #1: "name=orders.int, kfname=items.int,kps=116/117/118,kln=1/1/1", internal, OUTIN, keyed
Since INDEX is a command and not a statement, it needs the EXECUTE statement to run within a program. After you have added or made changes to the data file, you will need to replace the index file with a new one. For example, this one replaces the lastname/firstname key file.
5000 EXECUTE “INDEX ORDERS.int lastfirst.int 31/1 30/30 REPLACE”
Purpose of Index
The main reason we create index files is so that the user can quickly access any record by key, instead of sorting through all the records every time. On large master files, this saves time. Write a quick program, NAMEFIND to access records based on last names. This will involve the following steps:
- Prompting the user for a last name
- Opening the file
- Reading it by key (the last name input)
- Displaying the complete record
A more complicated program might then accept changes, print a label, or any number of other actions.
But wait! What if two people have the last name? Add lines to the program to solve this problem if the first search does not provide the correct person. See the solutions page for help.
Basic Overview of SORT
The sort command allows BR to create a sorted output file from a randomly ordered file of records, and to rearrange the records in a variety of ways. You can either create a second copy of the original but in sorted order, or you can just create an index (much like the index command itself).
How does it work?
The syntax for the SORT command is very simple:
Let's go ahead and create a sort control file, and then sort our ORDERS.INT data file. Sort files can be created within BR as an internal file, or run as a PROC file. Since text editor based PROC files are easier to work with, we will use this method. Open any text editor to begin, and write SORT on the first line. Save it as SAMPLESORT. Since we're using it as a PROC file, to run our sort, you will use this syntax from READY mode:
The sort control file itself, however, has far more parts to it. We'll build ours as we go along.
The Sort Control File
The SORT command can be run from READY mode, a PROC file, or with the EXECUTE command, and creates a file according to the specifications listed. These are Comment, File, Alts, Record, Sum and Mask:
1. ! - a comment in the sort file will display a message to the screen. Add one to your file now:
! Creating a sort file, don't you worry!
2. FILE (required) - After naming the input and output files (and optionally, paths and directories), you can choose the type of output file ("A" and “B” will make address-out sorts, where the 'address' is the relative record number. "R" calls for a record-out sort, which means that each entire record is written to the output file), the collating sequence, optional replacement of previous output files with the same name, and file sharing rules for the input file. Next, the collate sequence can be “A” for alternate, or “N” for native. Then you can specify to REPLACE the old file, or add file sharing specs. Other than the file names, all these are optional. Go ahead and create the FILE line:
Commas must be present for every skipped optional parameter.
3. ALTS – This reorders part of the collating sequence or can set certain characters equal to a new collating value (i.e. it could make a set of information in uppercase equal a set in lowercase).TO begin with, RO means that a set of characters is to be reordered, or EQ means that a new, single collating value is to be assigned to the specified characters (the value and the characters are set equal).
Following RO, you must specify the new starting value (0 to 255) and the character sequence, which is a list of up to 28 characters, enclosed in quotations, according to which the file will be reordered. The value which is specified as the "new starting value" will be assigned to the first character specified in the "character sequence". The next sequential value will be assigned to the second character in the "character sequence", and so on. If you want to reorder more than 28 characters, you must use an additional ALTS specification.
Following EQ, specify the new value (again, from 0 to 255) assigned to the character, and character sequence (again, in quotes), which is the character or set of characters which is to be given a new value.
ALTS RO 1,”VERYQUICKFOX”
Will reorder strings accordingly, instead of alphabetically.
ALTS EQ 100,”9876543210”
Will reorder strings that begin with numbers in descending order, beginning at 100.
Create an ALTS for our working model now.
And example might be: ALTS RO 1,"VERYQUICKFOX"
4. RECORD - specifically includes or eliminates certain records from the sorting procedure.
“I” means include, while “O” means omit a record according to a specified select field. The keywords AND and OR can further narrow or widen the inclusion according to other specific qualities. RECORD has the following syntax:
RECORD I/O, start position, field length, field type, lower limit, upper limit, AND/OR
AND and OR are optional.
The following examples will include only customers from Texas and Louisiana:
RECORD I,106,2,C,"TX","TX",OR RECORD I,106,2,C,"LA","LA",OR
Just in case any states were entered in lowercase, also provide the lowercase examples:
RECORD I,106,2,C,"tx","tx",OR RECORD I,106,2,C,"la","la",OR
The following example will omit records that do not have a first name:
RECORD O 1,30,C,” “,”A”
Choose any variety of RECORD to narrow down your output sort file.
5. SUM - displays the total number of records in the input file, how many were sorted, and how many are in the output file when a file has been sorted. It requires the user to press enter in order to continue. It is the simplest of the parameters:
6. Finally, MASK is required, and identifies up to ten sort fields and determines how they should be sorted, either ascending (A) or descending (D). The syntax is as follows:
MASK start position, field length, form spec, A/D
and can be repeated up to ten times, separated by commas. The form spec can be the following: B (Binary); BL (Binary low); BH (Binary high); C (Character); D (Double-precision); L (Long); N (Numeric); PD (Packed decimal); S (Single-precision) and ZD (Zoned decimal).
Just a simple last name then first name sort can be controlled like this:
Go ahead run your sort control file from ready mode, by typing PROC SAMPLESORT.
If you have trouble with it, go over everything and check for punctuation and spelling errors.
Why do we need this? As you can see, SORT provides more options for creating index files and reorganizing your current data files. For very large files, this saves time while accessing information.