View Full Version : java and regex
_underdog
11-25-2002, 06:29 PM
I am trying to use the new regex functionallity in jdk1.4.
I am reading a line of data that is pipe delimited and I want to use the String.split(String regex) function. I am not familiar with regular expressions and I believe '|' is a special character in regex. I did:
myString.split("|")
this returned an array where it split on each character. How would I do a split on a pipe character?
Dru Lee Parsec
11-25-2002, 06:53 PM
this returned an array where it split on each character.
Really?? From the docs on String.split() It looks like you did it correctly. An alternative would be to use the StringTokenizer class. But "split" is suppose to be much faster than a StringTokenizer.
I don't have 1.4 here at work. I'll test "split" tonight at home on 1.4.1
Strike
11-25-2002, 07:07 PM
"|" should not be a special character in a regex, though I don't claim to be an expert on Java regex syntax. It's not a special character in any of the other flavors of regex that I know (perl, posix, python, etc.)
_underdog
11-25-2002, 07:14 PM
here is an example of what I am trying to do:
public class RegexTest{
public static void main(String[] args){
String myString = "one|two|three|four|five";
String[] strings = myString.split("|");
for(int i=0;i<strings.length;i++){
System.out.println(strings[i]);
}
}
}
the results were:
o
n
e
|
t
w
o
|
t
h
r
e
e
|
f
o
u
r
|
f
i
v
e
"A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the "|" in this way. This can be used inside groups (see below) as well. REs separated by "|" are tried from left to right, and the first one that allows the complete pattern to match is considered the accepted branch. This means that if A matches, B will never be tested, even if it would produce a longer overall match. In other words, the "|" operator is never greedy. To match a literal "|", use \|, or enclose it inside a character class, as in [|]."
You sure about that strike?
Strike
11-25-2002, 07:19 PM
doh, I lose
I usually just use []s for selecting multiple things (because usually I don't need to pick more than one char, but most assuredly | is a special character. BUT, I'm pretty sure that if it is NOT within ()s that it is not treated as such. Java's policy on that, however, is one I don't know.
Strike
11-25-2002, 07:25 PM
okay, I lose^2
it's a special character outside ()s in most languages too (python, notably):
>>> pipe_re = re.compile(".*|.*")
>>> pipe_re.match("abcd|efgh")
<_sre.SRE_Match object at 0x8171790>
>>> pipe_solo_re = re.compile("|")
>>> pipe_solo_re.match("|")
<_sre.SRE_Match object at 0x8180510>
>>> a_or_b = re.compile("a|b")
>>> a_or_b.match("a")
<_sre.SRE_Match object at 0x81709c8>
But, by itself, it should work (as shown above).
----edit----
crap, I didn't realize that was from the Python re module docs, kmj. How dare I question those? :(
_underdog
11-25-2002, 07:26 PM
Brackets worked:
String[] strings = myString.split("[|]");
Thanks a lot. Just so you know if you try "\|" you get a illegal escape character error when you try and compile.
jemfinch
11-25-2002, 07:40 PM
Yeah, you should get an invalid escape character when you try to compile. | isn't a valid string escape character. You need to escape the pipe for the regexp, not the string. Use "\\|".
Jeremy
or (as I prefer) use r'\|' I always use rawstrings for regexs.
vBulletin® v3.7.0, Copyright ©2000-2009, Jelsoft Enterprises Ltd.