Sort Facility

From BR Wiki
(Redirected from Sort)
Jump to navigation Jump to search

Business Rules sort facility allows you to produce a sorted output file from a randomly ordered file of records, and to rearrange the records in a variety of ways.

A simple Sort Tutorial is now available.

Sort Command

The SORT command is the instruction that tells BR to begin the sorting procedure. Its only parameter is an optional control file name that contains all the specifications about how a particular file, or set of files, is to be sorted.

BR will execute the SORT command in any of the following 3 ways:

  1. From READY mode. When followed by the name of the sort control file, SORT can be executed from READY mode.
  2. From a procedure file. When followed by either a sort control file name or by the specifications that make up a single sort group, SORT can be executed from a procedure file.
  3. With the EXECUTE statement. When the name of the sort control file is included, SORT may be specified by the EXECUTE statement.

BR SORT facility allows you to produce a sorted output file from a randomly ordered file of records, and to rearrange the records in a variety of ways.

Comments and Examples

The following example causes the sort specifications in ALPHABET.FIL to be executed (when ALPHABET.FIL is a sort control file in the form of an internal file):

SORT ALPHABET.FIL

Alternatively, and more easily edited sort control files can be created as PROC files. For example:

PROC SAMPLESORT

This command will run the procedure file SAMPLESORT, which begins with the SORT command, and contains all the sort control information. The SAMPLESORT procedure file might look like this:

Sort
! Creating a sort file, don't you worry!
FILE orders.int,,,samplesort2,,,,,R,,REPLACE,SHR
ALTS RO 1,"VERYQUICKFOX"
RECORD I,106,2,C,"TX","TX",OR
RECORD I,106,2,C,"LA","LA",OR
RECORD I,106,2,C,"tx","tx",OR
RECORD I,106,2,C,"la","la",OR
SUM 
MASK 31,30,C,A,1,30,C,A

SORT tells BR to run a sort as the first step of the procedure. The comment will show up on the operator's screen. Only FILE and MASK are required. Each parameter will be described in detail below, but this particular SORT will return a file named samplesort2 of the data in the ORDERS.INT file, containing only the customer information from Texas and Louisiana arranged alphabetically according to "VERYQUICKFOX". Commas are necessary when skipping optional parameters within each line.

Syntax

SORT [<control file>]

Defaults

1.) Look to the next lines in the procedure file for the sort specifications.

Parameters

The "control file" parameter specifies the name (and path, if required) of the sort control file to be executed. The sort control file contains up to six different types of specifications:

  1. COMMENT (use ! as the first character of a comment)
  2. FILE
  3. ALTS
  4. RECORD
  5. SUM
  6. MASK

FILE and MASK are required, others are optional.

The use of SORT without a control file-ref can only be done from within a procedure. All required sort specifications must then immediately follow the SORT command. (See Sort Facility for more information).

Sort control files can be created in a text editor as a procedure file, or as an internal file using WRITE. The simplest way is using a procedure file, since you can easily view and edit the file.

Technical Considerations

The Sort command performs a Clear operation (unless a program is active) and begins execution of the SORT control file. The number of RECORD specifications that may be used in sort control files is 20.

A user-defined collating sequence may be accessed through the FILE specification's collating sequence parameter. See COLLATE ALTERNATE in BRConfig.sys for more information.

Types of Sort Specifications

Six different types of sort specifications are allowed in each sort group. FILE and MASK are required. The following table summarizes some information about the sort specifications:

Below are six sections describing each SORT specification in detail.

Comment

The optional Comment (!) specification displays a message to the operator on the screen. Usually it's a description of the current sort. Comments appear exactly as you enter them.

Comments and Examples

BR displays up to twenty lines worth of Comments per sort group. Comments or Comment lines exceeding this amount are ignored. If a Comment is longer than 43 characters, it will wrap to the next line on the screen. The following is an example of SORT's Comment:

! Now sorting customer file

Syntax

Parameters

After the required ! symbol, you mat include an optional message.

Message is the information to be displayed on the screen exactly as it is entered.

Technical Considerations

1) Comment specifications may be placed anywhere before the MASK specification in the sort group.

SUM

SUM is an optional specification that causes BR to display a summary of record counts after a file has been sorted. The way the titles are displayed is currently under development.

Comments and Examples

SUM should be used only when an operator will be in attendance during the sort, as execution will not continue between sort groups until <ENTER> has been pressed.

SUM displays three record counts:

  1. the total number of records in the input file not including deleted records.
  2. the number of records that were included in the sort.
  3. the number of records that were written to the output file.

The following example replaces the first and last messages printed by the SUM specification. The "Records read" and "Records selected" titles are left unchanged, but the single space in the fourth position causes the title for "Records output" to be omitted.

   SUM Sort of HIST.FIL completed,,, ,Press <CR> to continue

Syntax


Defaults

  1. Display "Sort successfully completed."
  2. Display "Records read"
  3. Display "Records selected"
  4. Display "Records output"
  5. Display "Press any key to continue."

Parameters

Title1 is a replacement for the message "Sort successfully completed", which will automatically be displayed unless you specify otherwise.
Title2 is a replacement for the title "Records read". Unless you specify otherwise, BR displays this title next to a count of the total number of records read from the input file.
Title3 is a replacement for the title "Records selected". Unless you specify otherwise, BR displays this title next to the total number of records, which were included in the sort.
Title4 is a replacement for the title "Records output", which is displayed next to the total number of records that were written to the output file.
Title5 is a replacement for the message "Press Enter to continue", which is automatically displayed unless you specify otherwise.


Technical Considerations

  1. To entirely suppress a message, enter a space for the title. There is no way to suppress record counts when SUM is specified.
  2. BR truncates any title parameter, which is longer than 70 characters.
  3. The number of records selected displayed on the third line and the number of records output displayed on the fourth line should always be the same.

FILE

The FILE specification is a required specification that identifies the following:

  1. Names of input and output files.
  2. Directory for workspace.
  3. Type of output file.
  4. Desired collating sequence.
  5. Optional replacement of previous output files with the same name.
  6. File sharing rules for the input file.

Comments and Examples

The following specification indicates that MASTER is the file to be sorted. It can be found in the VOL subdirectory on drive C. SORTOUT is the name of the output file. This can also be found in the VOL subdirectory on drive C. The WORK subdirectory, on drive D, is the designated work space. An address-out sort (PD3 format) is to be performed, and the native collating sequence is to be used (unless the BRConfig.sys file or CONFIG command has specified ALTERNATE):

FILE MASTER,VOL,C,SORTOUT,VOL,C,WORK,D,A,N

In the following example, SORT.IN is the file to be sorted. It is located in the current directory on the current drive. SORT[WSID].OUT is the output file. The current drive and directory will be used for the work space. In the name SORT[WSID].OUT, the [WSID] will be replaced with the current workstation ID. A record-out sort using the alternate collating sequence is called for. If SORT[WSID].OUT already exists, BR will replace it with the newly sorted file.

FILE SORT.IN,,,SORT[WSID].OUT,,,,,R,A,REPLACE

Syntax

Defaults

  1. Use the current directory.
  2. Use the current drive.
  3. Record-out sort (FILE spec)
  4. Default according to the COLLATE specification in the BRConfig.sys file.
  5. Do not replace existing file by same name.
  6. Use SHRI.

Parameters

Input file name is the name of the internal file containing the records to be sorted. "Path" is the sequence of directories, which leads to this file. "Drive" is the drive on which the subdirectory path and file can be found.
Output file name is the name of the file that is to contain the sorted records (or just the record numbers of the records in sorted order). "Path" is the sequence of directories, which leads to this file, and "drive" is the drive on which the subdirectory path and file can be found.
Workpath is a sequence of subdirectories, which leads to the disk area where BR should create a temporary Workspace. "Workrive" is the drive on which this subdirectory path can be found. There is no need to specify a name for the work file, as BR handles this internally. See the Technical Considerations section below for formulas to help you estimate work file space.
FILE spec allows you to choose from three different sort storage methods. "A" calls for an address-out sort, which is to be stored in PD3 format. "B" calls for an address-out sort, which is to be stored in B4 format. (Note: Not BH4) In both cases, the "address" which is being stored is the relative record number. "R" calls for a record-out sort, which means that each entire record is written to the output file. See the Technical Considerations section below for formulas to help you estimate address-out and record-out space. Address-out sorts execute much more quickly than record-out sorts, and the B storage method is faster than the A storage method. When using the A Method, PD 3 has a limit of 99,999, so any records >10,000 will have a 0 in the PD 3 value. When possible using "B" to avoid problems with large data sets.
Collating sequence allows you to specify a collating sequence for the sort. "A" calls for the alternate collating sequence to be used, and "N" calls for the native sequence to be used. In most instances, the FILE specification collating sequence will override the BRConfig.sys file's COLLATE specification (for the sort only). There is one exception, however: if the FILES specification is N, and the BRConfig.sys specification is COLLATE ALTERNATE, the alternate collating sequence will be used. The following table should help clarify when each sequence is used. NOTE that the column labeled BRConfig.sys refers to the specification currently in use, whether it was actually specified by the BRConfig.sys file or by the CONFIG command:

*Note the exception with this option

REPLACE replaces an old sort output file (if it exists under the specified name) with the new output file. This eliminates the need to free or drop the file before sorting. REPLACE should only be used with caution, as the existence of the output file may indicate that a previous error has occurred. You may wish to restrict the use of this parameter to WSID temporary files and to address-out files.
share spec allows you to specify file sharing rules for the input file. Any one of the NOSHR, SHRI, SHR or SHRU keywords is accepted. Keep in mind that the sort output file will not be in sorted order if SHR is specified and a record's sort field is modified by another workstation during the sort. (See "share specs" for more information about share parameters)

All commas must be included in the FILE specification, even if some parameters are omitted.

Technical Considerations

1) To estimate the amount of space BR will need for its temporary work space, use the following formula (applies to both address-out and record-out sorts):

Number of records selected x 2
x Sort key length + 4
= N [round up to a multiple of 1024]

2) To estimate the byte size of a PD3-formatted output file for an address-out sort (specified with the "A" parameter), use the following formula:

Number of records selected
x 4
+ 15
= N [round up to a multiple of 512]

3) The record limit for a PD3-formatted address-out sort ("A"), is 99,999 records.

4) To estimate the byte size of a B4-formatted output file for an address-out sort (specified with the "B" parameter), use the following formula:

Number of records selected
x 5
+ 16
= N [round up to a multiple of 512]

5) The record limit for a B4-formatted address-out sort ("R"), is 2.147 billion (2**31-1) records.

6) To estimate the byte size of the output file for a record-out sort (specified with the "R" parameter), use the following formula:

Number of records selected
x Record length + 1
= N [round up to a multiple of 512]

7) BR automatically clears the controls (including collating sequence) set by one sort group before it begins executing the next sort group.

ALTS

The optional ALTS specification allows you to either reorder part of the collating sequence or to set certain characters equal to a new collating value.

Comments and Examples

ALTS is frequently used to set the values of uppercase letters to the same values as their lowercase counterparts.

ALTS specifications take effect only during character comparisons when C is specified as the format type.

The following example reorders the collating values for the uppercase letters A through Z to match the collating values for their lowercase counterparts:

ALTS RO,97,"ABCDEFGHIJKLMNOPQRSTUVWXYZ"

The following example reorders lowercase letters so that vowels are sorted before consonants:

ALTS RO,97,"aeioubcdfghjklmnpqrstvwXYZ"

The following nine specifications set the collating values for the characters 1 through 9 to a collating value of 48. These values come into effect only when the sort is performed on C field types. It does not effect the sorting of the digits 1 through 9 when they appear in N fields:

ALTS EQ,48,"123456789"

Syntax

Parameters

In the top route of the syntax diagram, "RO" indicates that a set of characters is to be reordered.

"New starting value" is the first value to be assigned in the reordering process. It must be a number from 0 to 255, and it must not be allowed to increment beyond 255. Thus if you enter a value of 253, only three characters may be reordered.

"Character sequence" is a list of up to 28 characters which are to be reordered. It must be enclosed in quotation marks. The value which is specified as the "new starting value" will be assigned to the first character specified in the "character sequence". The next sequential value will be assigned to the second character in the "character sequence", and so on. If you wish to reorder more than 28 characters, you must use an additional ALTS specification.

In the lower route of the syntax diagram, "EQ" indicates that a new, single collating value is to be assigned to the specified characters (the value and the characters are set equal).

"New value" is a value from 0 to 255 which is to be assigned to the character. "Character sequence" is the character or set of characters which is to be given a new value; it must be enclosed within quotation marks.

Technical Considerations

1) If the ALTERNATE collating sequence is used for the sort, the following implied ALTS specification is automatically registered before any other ALTS specification is executed. Subsequent and overlapping ALTS specifications will take precedence over this specification:

     ALTS RO,176,"0123456789"

2) There is no limit to the number of ALTS specifications in a single sort group.

3) BR automatically clears all controls (including collating sequence and ALTS reordering) set by one sort group before it begins executing the next sort group.

Record

The optional RECORD specification allows you to specifically include or eliminate particular records from the sorting procedure.

Comments and Examples

Each RECORD specification tells BR to examine a specific field, called the select field, and determines whether or not the record should be included in the sort. If the value of the select field falls inside the specified limits and I (include) has been specified, BR includes that record in the sort.

If the value of the select field falls outside the specified limits and O (omit) has been specified, BR includes that record in the sort.

When two or more RECORD specifications are connected by the keyword "AND" (also the default), a single record must meet the qualifications of both before it will be included in the sort.

The RECORD specification in the following example tells BR to include only the records that fall inside the specified limits. The select field starts in position one of the record and is four characters in length. It is in packed decimal (PD) format. If the select field is equal to or greater than 100, and not more than 999, it will be included in the sort.

RECORD I,1,4,PD,"100","999"

The following six RECORD specifications demonstrate the use of AND and OR. If a record passes the test in the first specification, it is automatically included in the sort. If it does not pass this test, it is evaluated according to the next three record specifications. If it does not receive a true evaluation for each, it may still be able to pass the requirements of the last two specifications. If it does not pass any of the three tests, it is not included in the sort:

RECORD I,40,5,C,"55024","55036",OR
RECORD I,49,15,C,"DAKOTA","DAKOTA",AND
RECORD I,20,15,C,"OUST RD","OUST RD",AND
RECORD I,15,5,C,"47315","47722",OR
RECORD I,49,15,C,"DAKOTA","DAKOTA",AND
RECORD I,20,15,C,"RIDGE RD","RIDGE RD"

The following record specification uses comparison fields in each individual record as the select field's lower and upper limits. An example of how it would be used is to identify all inventory items which are either understocked or overstocked. The select field (which represents the current inventory amount), is described as the six-character field that starts at position 28. If the value of this field is less than the value of the field that starts at position 52 (the minimum inventory amount) or greater than the value of the field that starts at position 59 (the maximum inventory amount), it will be included in the sort.

RECORD O,28,6,N,52,59

Syntax

Defaults

1) Use AND.

Parameters

RECORD's first parameter must either be "I" (include) or "O" (omit). If you choose I, BR will include all records with select fields that fall inside the values you specify. If you choose O, BR will exclude all records with select fields that fall outside the values you specify.

"Start pos" is the starting position of the select field, and "field length" is its character length. "Form spec" indicates the format of the field (C, N, PD, etc.; see the discussion of in the File Operations Section for more information).

BR compares the data in the specified field to lower and upper limits before determining whether or not its record should be included in the sort.

"Lower limit start pos" identifies the starting position of another field in the record. BR will use the value of this comparison field as the lower limit of the select field. The format type of the comparison field must be identical to that of the select field, and the field length must be the same.

"Upper limit start pos" identifies the starting position of the field which contains the upper limit value for the select field. The format type of this comparison field must be identical to that of the select field, and the field length must be the same.

"Lower limit" and "upper limit" are constants enclosed within quotation marks. BR will consider these to be the low and high limit values for the select field in determining whether or not the record should be included. BR will not accept strings greater than 40 characters in length for this parameter. It is important to distinguish between uppercase and lowercase letters when using these options, as BR makes a letter-for-letter comparison on character fields.

"OR" and "AND" are the RECORD specifications' last two parameters. If all the RECORD specifications in a sort group are joined by "AND", a record must pass the requirements of each before it will be included in the sort. The following two specifications, for instance, require that the value of a certain four-character field in the record fall outside 2000 and 8000 and that a certain two-character field is equal to MN. If a record doesn't pass both these tests, it is not included in the sort.

RECORD O,3,4,C,"2000","8000",AND
RECORD I,28,2,"MN","MN"

The "OR" parameter gives a second try to records that don't pass the first set of requirements. When it is included at the end of a RECORD specification, "OR" tells BR that the next RECORD specification begins a new set of requirements. If the current record passes the first set of requirements, BR does not check to see if it will pass any others. If the record does not pass the first set, however, BR sequentially checks to see if it will pass any other sets. As soon as it passes the requirements for one set of RECORD specifications, BR includes it in the sort.

When "passing"lower and higher limits in the "record" section be sure and enclose the incoming variables with the appropriate quote sequence. This is just and example:

LET Data_Record$='RECORD I,'&Str$(Recpos)&',8,C,"'&V_File$&'","'&V_File$&'"'

In the above example we are passing the Record Position and the lower and upper file numbers we want included in the "filter"

Technical Considerations

1) BR allows up to 20 RECORD specifications per sort group. (On releases prior to 3.21b, the limit was ten) 2) When the OR parameter is used, processing will be faster if the first conditions are the ones most likely to result in a true evaluation. 3) The two different types of lower and upper limit parameters may be specified in the same RECORD specification. 4) When I is specified, the RECORD specification's parameters are inclusive. This means that a record will be included in the sort if the select field's value is equal to the upper or lower limit value or if it is between the upper and lower limit values. 5) When O is specified, RECORD's upper and lower limits are exclusive; the select field value may not be equal to either the upper or lower limit if it is to be included in the sort.

MASK

MASK is a required specification that identifies up to ten sort fields and the manner (ascending or descending) in which they should be sorted.

Comments and Examples

A sort field contains the specific information to be sorted. When you wish to organize records in descending order by zip code, for example, the sort field is the zip code. The following MASK specification defines two sort fields. The first starts in position 1 and is four characters long; it is in packed decimal (PD) format, and it is to be sorted in ascending order. The second field starts in position 10 and is three characters long. It is in character (C) format, and is also to be sorted in ascending order.

MASK 1,4,PD,A,10,3,C,A

Syntax

Parameters

"Start position" is the starting record position of the field to be sorted, and "field length" is its character length.

"Form spec" indicates the format, or data type, of the field. The format specifications which can be used in the MASK specification include the following: B (Binary); BL (Binary low); BH (Binary high); C (Character); D (Double-precision); L (Long); N (Numeric); PD (Packed decimal); S (Single-precision) and ZD (Zoned decimal). See Format Specifications in the File Operations section for more information about each of these format types.

The last parameter must either be "A" to sort the specified record in ascending order, or "D" to sort it in descending order.

Technical Considerations

  1. The following field lengths are required for the D, S & L format specs: D is 8; S is 4; & L is 9.
  2. The total sort key length may not exceed 32,767 bytes.
  3. Up to ten fields may be specified with the MASK specification.
  4. MASK must always be the last specification in a sort group.
  5. The existence of MASK flags the end of the current sort group. BR will automatically treat any specifications that follow as part of the next sort group.

Sort Specification

The ! (for a comment), FILE, ALTS, RECORD, SUM and MASK specifications each control a different aspect of the sorting procedure. A set of these specifications makes up a sort group.

Sort Group

A sort group contains all the necessary specifications for controlling the sorting of a single file. If only one file is to be sorted, a single sort group may simply follow a SORT command when it is specified in a procedure file. When several files are to be sorted, the sort group for each file may be placed together in a single sort control file.

Sort Control File

A sort control file is an internal file made of one or more sort groups. BR produces, one after the other, an output file for each sort group. When multiple sort groups exist in the sort control file, all are executed; BR cannot single a particular sort group out and run it alone. The sort control file can also be created as a PROC file in a text editor. See the example near the top of this page.

The location of the MASK specification signals the end of each sort group in the control file. BR automatically clears all controls set by the previous sort group before it begins executing the next one.

When a control file contains more than one sort group and an error occurs during one of the sorts, only the sorts that have already been successfully completed will be processed.

Creating the Sort Control File

The sort control file can be created by a BR program.

Leading blanks in baseyear sensitive fields are replaced with zeroes for sorting purposes, provided the remainder of the field contains only numeric data.

Note- the Y2K sorting and indexing features interpret year zero as zero (instead of 2000) *IF* the month and day are zero.

Creating the Sort File with a Program

The most common reason for using a program to create a sort control file is to allow the user to select the information to be sorted. The following example demonstrates a program that asks for the range of customer numbers that are to be sorted. This information is then used to create the sort control file's RECORD specification.

00080 !
00090 PRINT FIELDS "20,1,c": "Enter beginning and ending customer"
00100 INPUT FIELDS "21,5,c 5,u;21,15,c 5,u": BC$,EC$
00110 OPEN #1:"name=CUST[WSID].SRT,recl=128,replace",INTERNAL,OUTPUT
00120 FORM C 128
00130 WRITE #1,USING 120: "! Sorting Customer File . . ."
00140 WRITE #1,USING 120: "FILE CUSTOMER.DAT,,,TEMP[WSID],,,,,A,N,SHR,REPLACE"
00150 WRITE #1,USING 120: "RECORD I,1,5,C,'" & BC$ & "','" &EC$ & "'"
00160 WRITE #1,USING 120: "MASK 1,5,C,A"
00170 CLOSE #1:
00180 !
00190 EXECUTE "SORT CUST[WSID].SRT"

For further explanation, see the Sort Facility or the Sort Control File Tutorial.


Sort Baseyear Processing

If a Y is appended to the A/D (ascending/descending) indicator of sort MASK statements or the I/O (include/exclude) field of RECORD statements, the field is subject to BASEYEAR processing.

If such a field is in display format, then the first two characters are assumed to be a BASEYEAR dependent value. If the field is packed (BH or PD) then the storage format is assumed to be YYMMDD, YYMM or YY format depending on the length of the field, similar to INDEX (see above table).

The Y2K sorting and indexing features interpret year zero as zero (instead of 2000) if the month and day are zero. Leading blanks in baseyear sensitive fields are replaced with zeroes for sorting purposes, provided the remainder of the field contains only numeric data.

Prior to release 3.83e, BR ignored the digits to the left of the rightmost six digits of a date value unless century was specified in the mask.

Example
... DAYS(1001021,"YMD") produced the same result as DAYS(001021,"YMD")

Now if BR is unsuccessful applying the "YMD" mask it now tries "CYMD". This change has inconvenienced some dealers. Therefore OPTION 18 is provided to ignore the presence of digits to the left of the rightmost six digits.

Improving Sort Speed

All of the following factors can affect the speed of a sorting operation:

  1. Number of records to be sorted. The fewer the records, the faster the sort.
  2. Type of sort. Address-out sorts takes less time to run than a record-out sorts because they require fewer disk reads and writes. Also the B storage method for address-out sorts is faster than the A storage method.
  3. RECORD specification(s). When RECORD specifications cause many records to be omitted from the sort, the sort runs faster.
  4. Length of sort field. The shorter the sort field, the faster the sort.
  5. Order of records in the input file. Files with records that are already close to the desired order sort more quickly than more randomly ordered files.
  6. Record size. The smaller the input records the faster the sort.
  7. Storage size of the computer. In general, the greater the storage, the faster the sort. If the computer does not have enough storage, it must use work files: the time required to read from and write to these files adds to the amount of time the sort takes.