BTree

From BR Wiki
Jump to navigation Jump to search

The BTree index facility is included. It is referred to as BTree2. BTree2 provides significantly faster (25 - 35%) performance in shared file processing. We have had a few reliability problems with the present BTree in situations where many concurrent updates are performed. Additionally, our users have experienced other media, network, and disk controller weaknesses that have corrupted indexes. So we have added diagnostic capabilities to BTree2.

A new BRConfig.sys statement is supported:

BTREE_VERIFY    (master-filename)  [ OFF ]

This statement causes BR to separately audit the structure of the current index branch after each operation that alters an index relating to the named file. Obviously some performance penalties are in effect when processing files in diagnostic mode. However, it can pinpoint what update operations are failing when failures occur.

Also, the INDEX command has a new keyword, VERIFY, that can be used in place of REPLACE or REORG. If used with a BTREE2 file, the index will be audited instead of being rebuilt. Furthermore, REORG, in relation to BTREE2 INDEX commands, means "audit and if needed rebuild."

To activate this feature, set OPTION 22 in your BRConfig.sys file. The system will automatically create new indexes in BTREE2 format and will keep track of which files utilize BTREE2 indexes. Note that BR does not permit mixing Btree index types within related files (multiple opens of the same master file with different indexes).

Indexing and sorting were significantly improved with respect to speed.

The INDEX command may now specify BADKEYS or LISTBADKEYS to effect the same processing as DUPKEYS and LISTDUPKEYS performs. Bad keys are keys with invalid Y2K data in them. If both LISTBADKEYS and LISTDUPKEYS are specified, then all bad keys are listed ahead of dupkeys. They can be distinguished by whether or not they are duplicate.

The creation of indexes can be interrupted with Ctrl-A, and can then be resumed with GO.

Structure

Index files are special internal files that can be read, written to, and processed as normal Business Rules internal files for special processing needs (although this is rarely done).

An index file generally consists of two parts: a primary (or sorted) area and an overflow (or unsorted) area. Immediately after the index file is created or reorganized, all the records in the master file are represented by sorted entries in the primary area of the index file. Whenever a record is added to the master file, or a key field is changed, an entry is also sent to the overflow area of the index file. These entries are not in sorted order. This means that processing time can be improved by periodically reorganizing the index file so that all records are in the sorted portion of the index file (see especially the REORG parameter on the Index_command).

An index file record is composed of the master record's key plus the master record's relative record number, in binary (B4) format. The following statement reads the key file of a master file with a 10-character key:

80 READ #1, USING "FORM C 10,B 4": KEY$,RECNBR

Although the above information can be helpful in unusual cases, most programmers do not need to know details about the record layout of an index file because the built-in features of the index facility are quite sufficient.

Sample Illustration

The following table illustrates the set-up of an index file as compared to a master file's relative record number and key field. The key fields from the first seven records of the master file, as demonstrated by the relative record number (first column), are shown. The key field of each record is a two-character state name (second column). When the index procedure has been performed on the master file, an index file is created which contains the state names in alphabetical order, followed by their relative position in the master file (third column).

If you had the master file shown in the above table, but not the index file, the retrieval of a particular record would have to be done sequentially. For example, if you wrote a program that needed to access the record for New York, the program would have to read each record sequentially, starting at the beginning of the master file, until it found the record for New York. In this example, that would require the program to read only five records to locate the proper one, but in long master files, the process can be very time-consuming and cumbersome.

If, on the other hand, the master file has an index file, retrieval can be done using the key-indexed access method. To find the record for New York using this method, the program goes to the index file and searches it for New York. This search does not take long, because the keys are arranged in ascending (alphabetical) order, and a binary search is used. The program then reads the corresponding relative record number for New York, which is 5. This number tells it to go directly to the fifth record in the master file; it does not have to search the master file from the beginning.

Creating With Index

There are two ways to create index files: with the Index_command and with the OPEN internal statement. This section addresses the use of INDEX. See the OPEN internal statement (particularly the KPS and KLN parameters) for more information about the second method. Before you run INDEX, you must know the four required parameters: the name of the master file, the name of the index file, the starting position(s) and the length(s) of the key section(s) of the key field you wish to use. See the Index_command for complete details of all the required and optional parameters.

Of the optional parameters, the most frequently used is REPLACE, which indicates the old index file should be deleted and replaced by this new file.

The REORG parameter greatly speeds up the Index_command because it reads only the index file (not the master file) and does not resort the portion of the index file which is already sorted. In the special case where an index file is already completely sorted, using the REORG parameter means INDEX will run very fast because it quits as soon as it determines that there is nothing to sort.

Other optional parameters specify where workspace should be found (usually defaulted to the current directory), whether a warning message should be provided for duplicate keys (DUPKEYS), whether duplicate keys should be listed (LISTDUPKEYS), whether the screen or something else should be used for LISTDUPKEYS and whether the alternate collating sequence should be overridden (NATIVE).

When using the Index_command from within a procedure file, Business Rules also accepts System/23 format statements that control the indexing process. We do not recommend, however, that you use System/23 format when writing new programs.

Once the Index_command is entered -or once the procedure containing the Index_command has begun - Business Rules performs a CLEAR operation to free up memory and then creates or recreates the index file as specified. (In the special case where INDEX is issued with the EXECUTE statement, no CLEAR operation is performed.)

The creation of indexes can be interrupted with Ctrl-A, and can then be resumed with GO.

Finding records with the KEY and SEARCH parameters

KEY and SEARCH parameters

When an internal file is opened with the KEYED parameter, DELETE, READ, RESTORE file and REWRITE statements will accept an optional clause for using the associated index file. In the technical discussions for these statements in the Statements chapter, this optional clause is referred to as the "KEY=" clause. In the READ and RESTORE statements, the "KEY=" clause can have any of the following four forms: KEY=n$, KEY>=n$, SEARCH=n$, and SEARCH>=n$. Only KEY= can be used with DELETE statements.

All forms of the KEY= clause will return the first match, which is the record with the lowest record number when there are duplicate keys. As used in this section, n$ can be any string expression. String constants, simple variables, subscripted variables, and string functions can be used. These elements may be combined with one another and used in concatenation and substring operations.

Line 300 illustrates the use of string functions to take the numeric value in the variable ACCT and make it a 5-character string by padding it with blanks on the left:

00300 RESTORE #1, KEY=LPAD$(STR$(ACCT),5):

The four forms of "KEY=" are constructed from the two keywords KEY and SEARCH, and the two operators = and >=. These keywords and operators are explained below.

KEY indicates that the length of string n$ must be identical to the length of the key field (i.e., KLN(file-num)).
SEARCH indicates that the length of n$ can be less than or equal to the length of the key field; the search process looks only at enough characters to match the length of n$.
= indicates n$ must be an exact match.
>= indicates that the next record in the key sequence is to be used when an exact match cannot be made.


Consider the following four examples for an index file. The date is a 6-character key field and contains the year in the first two positions:

00400 READ #1,KEY="880101":  NOKEY 920
00410 READ #1,KEY>="880101":  NOKEY 920
00420 READ #1,SEARCH="88": NOKEY 920
00430 READ #1,SEARCH>="88": NOKEY 920

Line 400 reads the first record with a key field of 880101; if no exact match exists, the NOKEY error occurs. Line 410 reads the first record with a key field of 880101; if no exact match exists, this statement reads the record with the next highest value in the key field, e.g., 880103. The NOKEY error occurs in this case only when no record in the entire file has a key field with a value greater than or equal to 880101.

Line 420 reads the first record with 88 in the first 2 positions of the key field. If there is no match for this criterion, the NOKEY error occurs -even if records exist with 87 or 89 in these positions. Line 430 also reads the first record with 88 in the first two positions of the key field; if there is no match for this criterion, the statement goes on the read the record with the next highest value in the first two positions of the key field. The NOKEY error occurs only when no record in the entire file has a key field with a value greater than or equal to 88 in the first two positions.