Str2Mat: Difference between revisions

From BR Wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
The '''Str2Mat'''  [[Internal Functions|Internal Function]] will split a string variable based on a [[delimiter]] and place the resulting strings into an array which STR2MAT dynamically re-dimensions.
The '''Str2Mat'''  [[Internal Functions|Internal Function]] will split a string variable based on a [[delimiter]] and place the resulting strings into an array which STR2MAT dynamically re-dimensions. The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data (as of 4.3).  


  STR2MAT(<string variable>, MAT <array name>, <delimiter$>)
  STR2MAT(<string variable>, MAT <array name>, [MAT] <delimiter$>, [<quote-type:trim>])


The default delimiter searches for the following combinations of [[line feed]] and [[carriage return]] characters:
===Parameters===
"String Variable" is the variable that contains the data to be converted into an array.
 
"MAT array-name" is the name of the array into which the variable will be placed.
 
"Delimiter$" is a string containing the character in the string variable which will be used to separate it into items to be placed in the array. For example, a comma ",". In 4.3 Delimiter can be an array.
 
"Quote:Trim" is an optional parameter which handles quotes within the string variable. Any delimiter (such as commas) that occur within the specified quotes will not split the data into separate array elements. Quote-type can be Q, QUOTES, ('), or ("), and is case insensitive. Q and QUOTES means that double quotes will be used. The trim flags can be :LTRM , :TRIM or :RTRM , and denote post processing of extracted elements. The leading colon is only present when quote-type is specified (as of 4.3).
 
===Defaults===
# The default delimiter searches for the following combinations of [[line feed]] and [[carriage return]] characters:


*[[line feed]] [[carriage return]]
*[[line feed]] [[carriage return]]
Line 10: Line 20:
*[[carriage return]]
*[[carriage return]]


When more than one occurrence of the same delimiters are used next to each other, BR honors all of them making an empty string element in the resulting array for all but the first occurrence of the delimiter. Consider the following example:
===Further Explanation===


  00010 str2mat("abcdeeeefghijk",mat a$,"e")
1. When more than one occurrence of the same delimiters are used next to each other, BR honors all of them making an empty string element in the resulting array for all but the first occurrence of the delimiter. Consider the following example:
  00020 print mat a$
  00010 let namelist$="Mary,John,,Salomi,Thomas,,,David,Sonia"
00020 str2mat(namelist$,mat customer$,",")
  00030 print mat customer$


Output:
Output:
  abcd
  Mary
 
John
Salomi
Thomas
   
   
David
Sonia
   
   
fghijk
Customer$(3), Customer$(6), and Customer$(7) would have a value of "".


Notice, that 3 empty strings are printed - one less than the number of "e" in the string "abcdeeeefghijk".
2. If the delimiter is "", every character will be put in a separate element of the array.


3. Str2Mat performs the opposite action of [[Mat2Str]]


If the delimiter is "", every character will be put in a separate element of the array.
4. Str2Mat dynamically redimensions the array (mat customer$ in the above example) as needed to include all of the items from the source string variable. It returns the number in the final array.


Str2Mat performs the opposite action of [[Mat2Str]]
5. When the delimiter is an array, both will signify the start of a new element in the final array. But when two consecutive delimiters are different, they will not create a blank element in the array. For example:
 
 
Str2Mat returns the number of items which were parsed in to the dynamically redimensioned array (mat a$).
dim namelist$*256,customer$(7),delim$(2)
 
let namelist$="Mary,Jo.hn,,.Salomi.Thomas,,,David,,.Sonia"
Introduced in version [[4.20]] of [[Business Rules!]].
let delim$(1)=","
 
let delim$(2)="."
==CSV and XML Parsing==
  str2mat(namelist$,mat customer$,Mat delim$)
 
print mat customer$
These enhancements are available as of BR! [[4.3]].
 
The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data.  
 
  STR2MAT( str$, MAT zzz$ [, [MAT] Sep$ [, flags$]] )
 
Where Sep$ may be an array and flag$ is in the format:


[ quote-type ] [ :LTRM ] | [ :TRIM ] | [ :RTRM ]
will return:
[[image:readc2.jpg]]


Where quote-type can be Q, QUOTES, ('), or ("), case insensitive. Q and QUOTES denote standard BR quote processing. The trim flags denote post processing of extracted elements and the leading colon is only present when quote-type is specified.
To restate this: when elements of a delimiter array occur adjacent to each other within the source string, they are grouped as one separator substring. When the same occur consecutively, it creates a null element in the final array output.
 
When Sep$ is an array, then any or all of the specified values are deemed to represent a single separator with the qualification that any one separator, cannot occur more than once in a string of adjacent separators. To restate this, when elements of a Sep$ array occur adjacent to each other within the source string, they are grouped as a separator substring.
 
Sep$ elements cannot occur more than once in a separator substring. When they do, it denotes the specification of a null element.  e.g. two successive commas or two successive occurrences of CR+LF both denote null elements. Essentially when Sep$ elements are 'consumed' by their recognition within the source string, then they cannot be re-recognized without inserting a null element into the output array.
 
In all cases BR dynamically redimensions the array as needed  and returns the number of elements.


===CSV Parsing (4.3)===
===CSV Parsing (4.3)===


Parsing CSV data files can be quite easy. The following code spinet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through  the records.
The following code spinet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through  the records.


  01000    dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60
  01000    dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60
Line 81: Line 89:
STR2MAT may also be used to Parse XML data.
STR2MAT may also be used to Parse XML data.


This is a bit more complex than parsing CSV files, but remains a powerful tool.
This powerful tool is a bit more complex than parsing CSV files, but useful nonetheless.


The following example will parse XML$ into "MAT XML_LINE$"
The following example will parse XML$ into "MAT XML_LINE$"
Line 104: Line 112:
   </XML
   </XML


If the node names are known, a more complete and useful technique can be performed. You may use an array of Delimiter$ values to parse the data.  Take the following example:


While the above technique is useful, a more complete and useful technique can be performed if the Node names are known. You may use an array of SEP$ values to parse the data.  Take the following example:
  100    dim XML$*999999,XML_LINE$(1)*32000,DELIM$(4)*32
 
  100    dim XML$*999999,XML_LINE$(1)*32000,SEP$(4)*32
  110    let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>"
  110    let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>"
  120    read MAT SEP$
  120    read MAT SEP$
  130    data </XML>,</NODE>,</ITEM>,</ITEM2>
  130    data </XML>,</NODE>,</ITEM>,</ITEM2>
  140    let STR2MAT(XML$,MAT XML_LINE$,MAT SEP$,"TRIM")
  140    let STR2MAT(XML$,MAT XML_LINE$,MAT DELIM$,"TRIM")
  150    print MAT XML_LINE$
  150    print MAT XML_LINE$



Revision as of 01:45, 12 July 2015

The Str2Mat Internal Function will split a string variable based on a delimiter and place the resulting strings into an array which STR2MAT dynamically re-dimensions. The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data (as of 4.3).

STR2MAT(<string variable>, MAT <array name>, [MAT] <delimiter$>, [<quote-type:trim>])

Parameters

"String Variable" is the variable that contains the data to be converted into an array.

"MAT array-name" is the name of the array into which the variable will be placed.

"Delimiter$" is a string containing the character in the string variable which will be used to separate it into items to be placed in the array. For example, a comma ",". In 4.3 Delimiter can be an array.

"Quote:Trim" is an optional parameter which handles quotes within the string variable. Any delimiter (such as commas) that occur within the specified quotes will not split the data into separate array elements. Quote-type can be Q, QUOTES, ('), or ("), and is case insensitive. Q and QUOTES means that double quotes will be used. The trim flags can be :LTRM , :TRIM or :RTRM , and denote post processing of extracted elements. The leading colon is only present when quote-type is specified (as of 4.3).

Defaults

  1. The default delimiter searches for the following combinations of line feed and carriage return characters:

Further Explanation

1. When more than one occurrence of the same delimiters are used next to each other, BR honors all of them making an empty string element in the resulting array for all but the first occurrence of the delimiter. Consider the following example:

00010 let namelist$="Mary,John,,Salomi,Thomas,,,David,Sonia"
00020 str2mat(namelist$,mat customer$,",")
00030 print mat customer$

Output:

Mary
John

Salomi
Thomas


David
Sonia

Customer$(3), Customer$(6), and Customer$(7) would have a value of "".

2. If the delimiter is "", every character will be put in a separate element of the array.

3. Str2Mat performs the opposite action of Mat2Str

4. Str2Mat dynamically redimensions the array (mat customer$ in the above example) as needed to include all of the items from the source string variable. It returns the number in the final array.

5. When the delimiter is an array, both will signify the start of a new element in the final array. But when two consecutive delimiters are different, they will not create a blank element in the array. For example:

dim namelist$*256,customer$(7),delim$(2)
let namelist$="Mary,Jo.hn,,.Salomi.Thomas,,,David,,.Sonia"
let delim$(1)=","
let delim$(2)="."
str2mat(namelist$,mat customer$,Mat delim$)
print mat customer$

will return:

To restate this: when elements of a delimiter array occur adjacent to each other within the source string, they are grouped as one separator substring. When the same occur consecutively, it creates a null element in the final array output.

CSV Parsing (4.3)

The following code spinet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through the records.

01000    dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60
01020    form C," "
01040    let CSV_FILE$="Sample_File.tab" : let TAB$=CHR$(9)
01060    open #(CSV_HANDLE:=10): "name="&CSV_FILE$&",shr",display,input 
01080    linput #CSV_HANDLE: CSV_HEADER$
01100    let CSV_DELIM$=TAB$
01120    if POS(CSV_HEADER$,TAB$) <= 0 then 
01140       let CSV_DELIM$=","
01160    end if 
01180    let STR2MAT(CSV_HEADER$,MAT CSV_FIELDS$,CSV_DELIM$,"QUOTES:TRIM")
01200    print using 1020: MAT CSV_FIELDS$
01220    do 
01240       linput #CSV_HANDLE: CSV_LINE$ eof Exit_Csv
01260       let STR2MAT(CSV_LINE$,MAT CSV_DATA$,CSV_DELIM$,"Q:trim")
01280       print using 1020: MAT CSV_DATA$
01300    loop 
01320 Exit_Csv: !

You might wish to copy any CSV file to Sample_File.tab and run this program to view the content.

XML Parsing Enhancements

STR2MAT may also be used to Parse XML data.

This powerful tool is a bit more complex than parsing CSV files, but useful nonetheless.

The following example will parse XML$ into "MAT XML_LINE$"

 10 DIM XML$*999999,XML_LINE$(1)*32000
 20 XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM></NODE></XML>"
 100 LET Str2mat(XML$,Mat XML_LINE$,">","TRIM")

This makes the parsing of XML a bit more convenient. The following XML sample shows how the function will parse the data

 <XML>
  <NODE>
    <ITEM>ITEM VALUE</ITEM>
  </NODE>
 </XML>
 <XML
 <NODE
 <ITEM
 ITEM VALUE</ITEM
 </NODE
 </XML

If the node names are known, a more complete and useful technique can be performed. You may use an array of Delimiter$ values to parse the data. Take the following example:

100    dim XML$*999999,XML_LINE$(1)*32000,DELIM$(4)*32
110    let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>"
120    read MAT SEP$
130    data </XML>,</NODE>,</ITEM>,</ITEM2>
140    let STR2MAT(XML$,MAT XML_LINE$,MAT DELIM$,"TRIM")
150    print MAT XML_LINE$

This program would return the following results:

 <XML><NODE><ITEM>ITEM VALUE
 <ITEM2>ITEM2 VALUE

Notice that "Nested Nodes" are listed before the initial data, this may be used to identify the node.

Quote Processing

Quotation marks suppress the recognition of separators in accordance with the following rules. Standard BR Quote Processing When examining str$ left to right, the first character (and the first character after each separator) is checked to see if is either (') or ("). If it is ether of those then it activates quotation processing which suppresses the recognition of separators until quotation processing is deactivated. The first character thus becomes the governing quote type until quotation processing is deactivated.

The string is copied until it ends or until an odd number of successive occurrences of the governing quote type is encountered. During this processing, two adjacent occurrences of the governing quote character denote an embedded occurrence of the quote character. Examples

  • "abc,def" -> abc,def where the comma is not recognized as a separator and is part of the data
  • abc"def -> abc"def naturally embedded quotes may occur anywhere within a string after the first character
  • "abc"def" -> abcdef" quotation processing is deactivated by the center quote mark
  • "abcdef" -> abcdef normal data
  • "abc'def" -> abc'def the single quote is treated like any other character while double quotes govern
  • 'abc"def' -> abc"def double quotes are treated like any other character while single quotes govern
  • "abc""def" -> abc"def pairs of governing quotes denote a single embedded quote
  • "abc"""def" -> abc"def" the third successive occurence deactivates quote processing
MAT2STR( MAT zzz$, str$ [, sep$ [, flags$]] )

Where flag$ is in the format:

[ quote-type ] [ :LTRM ] | [ :TRIM ] | [ :RTRM ]

Where quote-type can be Q, QUOTES, ('), or ("), case insensitive. Quote-type denotes that each element should be enclosed in quotation marks. The trim flags denote pre-processing of array elements and the leading colon is only present when quote-type is specified.

If Q or QUOTES is specified the BR automatically determines which quote type to apply as follows:

First the element is unpacked. That is, if it is contained in quotes, the quotes are stripped and embedded pairs are singled. Next the element is scanned left to right for either type of quote character (single or double). If a quote character is encountered the element is enclosed in the alternate quote type and embedded occurrences of that quote type are doubled. If no quote character is encountered then double quotes are applied.

Examples

Quote Type is Q or QUOTES

abcdef -> "abcdef"
abc'def -> "abc'def"
abc"def -> 'abc"def'
abc""def -> 'abc""def' embedded quotes are left intact when quotes are not active
'abcdef -> "'abcdef"

Quote Type is ' (quote type single)

abcdef -> 'abcdef'
'abcdef -> '''abcdef'   single quotes get doubled when embedded in single quotes
"abcdef -> '"abcdef'   leading double quote is treated normally

Quote type double mirrors quote type single.

MAT2STR and STR2MAT trim outside of quotes but not inside of quotes. Also MAT2STR always adds quotes when quotes are present in the data.

When using MAT2STR on a 2 dimensional array, the first delimiter is used for individual elements and the second delimiter at the end of each row. This principle also applies to three to seven dimensions.

Example Given the following two dimensional array zzz$ containing the values-

   1            2
   3            4

The following statements-

   10 Sep$(1)=","
   20 Sep$(2)=hex$("0D0A") ! CRLF
   30 MAT2STR( MAT zzz$, str$, MAT Sep$ )
   40 PRINT str$

Will produce-

   1,2
   3,4

Sample Program

The following program demonstrates quote processing and trimming:

00010 ! Rep Str2mat
00020    dim LINE$*400,DESC$(5)*30,SEP$(1)*20,QTYPE$(2)*20
00030 ! 
00040    print NEWPAGE
00050    let LINE$='"  TEST1",, " TEST,,3","TEST4" ,,"TE""S""T6 "'
00060    print LINE$;TAB(1);"no augmentation or quote recognition"
00070    print "note column 3 gets split up and quotes are data"
00080    let STR2MAT(LINE$,MAT DESC$,',')
00090    let MAT2STR(MAT DESC$,LINE$,',')
00100    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$
00110    linput Z$
00120 ! 
00130    print LINE$;TAB(1);"strip quotes and trim outside the quotes"
00140    print "convert commas to tildes"
00150    let STR2MAT(LINE$,MAT DESC$,',',"Q:TRIM")
00160    let MAT2STR(MAT DESC$,LINE$,'~')
00170    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$
00180    linput Z$
00190 ! 
00200    let LINE$='"  TEST1",, " TEST,,3","TEST4" ,,"TE""S""T6 "'
00210    print LINE$;TAB(1);"strip quotes and notrim"
00220    print "convert commas to tildes"
00230    print "column 3 is broken up because the leading quote is embedded without trim"
00240    let STR2MAT(LINE$,MAT DESC$,',',"Q")
00250    let MAT2STR(MAT DESC$,LINE$,'~')
00260    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$