OWA_PATTERN

Anon

Hi all,
I'm working with Oracle8i, and was displeased to find that you cannot use regular expressions in your queries with the same ease provided by MySQL (with rlike/regexp). Our DBA has installed this OWA_PATTERN package for me, which, by the docs, seems to be what I need.. I just can't figure out how to make it go.. (and it's driving me insane)

If anyone has any experience with/advice on/tips about this, it would be greatly appreciated. Thanks, and happy hacking!

-Kip

rogue

Kip DeMane wrote:

If anyone has any experience with/advice on/tips about this, it would be greatly appreciated. Thanks, and happy hacking!

-Kip

--------------------------------------------Hi Kip,

If you extract OWA_PATTERN out of the database, the package spec has all the info you need:

--------------------------------------------The package OWA_PATTERN is a "regular expression" pattern matching package. There are 3 fundamental subprograms in OWA_PATTERN.
The are: AMATCH, MATCH, and CHANGE.

MATCH provides the ability to determine if a pattern exists in a string.

AMATCH provides more flexibilty to specify WHERE in the string to search for the pattern and also gives more information in return by indicating WHERE in the string the end of the pattern was found.

CHANGE provides the ability to change occurances of a matched pattern to a new string.

The algorithms used here are derived from "Software Tools" by Brian Kernighan. These algorithms have been extended to support most of Perl's pattern matching functionality.

The regular expression elements that are supported are:

Assertions:

^ Matches the beginning of a line (or string)
$ Matches the end of a line (or string)

Quantifiers:

{n,m} Must match at least n times, but not more than m times
{n,} Must match at least n times
{n} Must match exactly n times.
* 0 or more occurances
+ 1 or more occurances
? 0 or 1 occurance(s)

Legal atoms:

. matches any character except \n

A list of characters in square brackets [] is a class of characters, for example [0-9] indicates match any character from 0 to 9.

\n matches newlines
\t matches tabs
\d matches digits [0-9]
\D matches non-digits [^0-9]
\w matches word characters (alphanumeric) [0-9a-z_A-Z]
\W matches non-word characters [^0-9a-z_A-Z]
\s matches whitespace characters [ \t\n]
\S matches non-whitespace characters [^ \t\n]
\b matches on "word" boundaries (between \w and \W)

A backslashed x followed by two hexadecimal digits, such as \x7f, matches the character having that hexadecimal value.

A backslashed 2 or 3 digit octal number such as \033 matches the character with the specified value.

Any other "backslashed" character matches itself.

Valid flags passed to CHANGE, MATCH, AMATCH:

i - perform pattern matching in a case-insensitive manner.
g - perform all changes globally (all occurances)

Replacements

& can be used in the substitution string to "re-place" that which has been matched.

For example: change('Oracle 7.1.3', '\d.\d.\d', 'Version &');

         yields: Oracle Version 7.1.3

<n> can be used to do backreferences, meaning to replace portions of the matched string:

change('Matt Bookman','(Matt) (Bookman)','\2, \1')
--> Bookman, Matt

Match Extraction

One can extract the matched values from the parenthesized patterns, for example:

declare
string varchar2(32767);
components owa_text.vc_arr;
begin
string := 'Today is 01/04/72';
if (owa_pattern.match(string, '(\d\d)/(\d\d)/(\d\d)', components))
then
htp.print('The month is '||components(1));
htp.print('The day is '||components(2));
htp.print('The year is '||components(3));
end if;
end;

Possible future enhancements:

\B - match on non-"word" boundaries (between \w and \w, or \W and \W)
"or" character matches:
change(text,'(Unix|unix)','UNIX') would change both occurances
Using control character references:

A backslashed c followed by a single character, such as \cD, matches
the corresponding control character.

-- No support for:
-- \b == Backspace (in a character class)
-- \r == Carriage return
-- \f == Form feed
-- Modified support for:
-- \s == A whitespace charcter -> [ \t\n\r\f]

-- \S == A non-whitespace character

Hope that helps ;-)

smd