In today's lecture we look at the StringBuffer
class and Java's regular expression mechanism.
StringBuffer
A String
encapsulates an immutable sequence
of characters. Even though the String
class provides
transformer methods to change characters or substrings in a string,
these methods always return a new String
object instead
of changing the existing object.
Sometimes the client would prefer to change the existing sequence
of characters. The StringBuffer
and
StringBuilder
objects encapsulate
a mutable sequence of characters.
For CSE1020 it does not matter if you use StringBuffer
or
StringBuilder
(their APIs are identical).
We will prefer StringBuffer
to stay consistent with the textbook, but the StringBuffer
documentation actually recommends StringBuilder
as the
preferred choice for common usage.
StringBuffer
Generally, you should prefer using ordinary strings, but sometimes string buffers produce simpler code or more efficient code.
A standard example is reading in a text file one line at a time:
import java.io.File; import java.io.PrintStream; import java.util.Scanner; import javax.swing.JFileChooser; public class ReadFile { public static void main(String[] args) throws java.io.IOException { PrintStream output = System.out; JFileChooser chooser = new JFileChooser(); int returnVal = chooser.showOpenDialog(null); if (returnVal == JFileChooser.APPROVE_OPTION) { File file = chooser.getSelectedFile(); Scanner input = new Scanner(file); String text = ""; int lines = 0; for (; input.hasNextLine(); lines++) { String line = input.nextLine(); text = text + line; } output.println("File has " + lines + " lines"); } } }
On my laptop PC, this program takes about 10 seconds to read in a file with 10,000 words where each word is on a separate line.
Instead of concatenating to a string, we can append the text to a string buffer.
The client can append anything to the end of a string buffer using
the overloaded append
methods; it is the string
representation that is appended to the string buffer.
import java.io.File; import java.io.PrintStream; import java.util.Scanner; import javax.swing.JFileChooser; public class ReadFile2 { public static void main(String[] args) throws java.io.IOException { PrintStream output = System.out; JFileChooser chooser = new JFileChooser(); int returnVal = chooser.showOpenDialog(null); if (returnVal == JFileChooser.APPROVE_OPTION) { File file = chooser.getSelectedFile(); Scanner input = new Scanner(file); StringBuffer text = new StringBuffer(); int lines = 0; for (; input.hasNextLine(); lines++) { String line = input.nextLine(); text.append(line); } output.println("File has " + lines + " lines"); } } }
On my laptop PC, this program takes less than 1 second to read in a file with 10,000 words where each word is on a separate line.
The client can insert anything into a string buffer at any
valid position using the overloaded insert
methods;
it is the string representation that is inserted into the string
buffer.
StringBuffer s = new StringBuffer("I had breakfast."); output.println(s.toString()); s.insert(6, " eggs for "); output.println(s.toString()); int numEggs = 2; s.insert(6, numEggs); output.println(s.toString());
The above code fragment prints:
I had breakfast. I had eggs for breakfast. I had 2 eggs for breakfast.
The client can delete a single character from a string
buffer using the deleteCharAt
method. A range
of characters can be deleted using the delete
method.
StringBuffer s = new StringBuffer("I had 2 eggs for breakfast."); output.println(s.toString()); s.delete(6, 8); output.println(s.toString()); s.delete(6, 15); output.println(s.toString());
The above code fragment prints:
I had 2 eggs for breakfast. I had eggs for breakfast. I had breakfast.
import java.io.PrintStream; public class StringBufferExample { public static void main(String[] args) { PrintStream output = System.out; StringBuffer s = new StringBuffer("I had breakfast."); output.println(s.toString()); s.insert(6, " eggs for "); output.println(s.toString()); int numEggs = 2; s.insert(6, numEggs); output.println(s.toString()); s.delete(6, 8); output.println(s.toString()); s.delete(6, 15); output.println(s.toString()); } }
StringBuffer
SummaryYou should consider using a string buffer or string builder if:
In Java, a regular expression (or regex) is a string that describes a pattern of characters in a concise unambiguous fashion. Regexes are typically used for pattern matching. Some examples are determining if a string:
The term regular expression means something else in formal language theory (where the term was invented) which you will learn about in CSE2001: Introduction to Theory of Computation.
The most basic form of pattern matching supported by the Java regex
API is the matching of a string literal. For example,
the string "foo"
matches the string "foo"
.
String s = "foo"; String regex = "foo"; output.println(s.matches(regex));
The above code fragment will print true
. Notice that this
example of matching is equivalent to using equals
.
import java.io.PrintStream; public class Regex1 { public static void main(String[] args) { PrintStream output = System.out; String s = "foo"; String regex = "foo"; output.println(s.matches(regex)); } }
Suppose that we are now interested in matching
"foo"
followed by any character (including whitespace).
In the Java regex API, the period '.'
is used to
match any character.
The regex "foo."
means the string "foo"
followed by any character.
String regex = "foo."; output.println("foo".matches(regex)); output.println("goo".matches(regex)); output.println("hello".matches(regex)); output.println("foofighter".matches(regex)); output.println(); output.println("foot".matches(regex)); output.println("fool".matches(regex)); output.println("foo9".matches(regex)); output.println("foo ".matches(regex));
The above code fragment will print
false false false false true true true true
import java.io.PrintStream; public class Regex2 { public static void main(String[] args) { PrintStream output = System.out; String regex = "foo."; output.println("foo".matches(regex)); output.println("goo".matches(regex)); output.println("hello".matches(regex)); output.println("foofighter".matches(regex)); output.println(); output.println("foot".matches(regex)); output.println("fool".matches(regex)); output.println("foo9".matches(regex)); output.println("foo ".matches(regex)); } }
The period '.'
in a regular expression is
a metacharacter—a character with special meaning interpreted
by the matcher.
The full set of metacharacters is ([{\^-$|]})?*+.
and
we will see examples of most if not all of them in the following slides.
Sometimes, you will want a metacharacter to be treated as a normal
character. For example, suppose you wanted to match only the string
"foo."
. The following code does not work:
String regex = "foo."; boolean matches = s.match(regex);
because regex
will also match strings such as
"foo!"
, "food"
, and
"fooy"
. To match only "foo."
you must
use a backslash '\\'
character before the metacharacter.
String regex = "foo\\."; output.println("food".matches(regex)); output.println("foo.".matches(regex));
The above code fragment prints:
false true
import java.io.PrintStream; public class Regex3 { public static void main(String[] args) { PrintStream output = System.out; String regex = "foo\\."; output.println("food".matches(regex)); output.println("foo.".matches(regex)); } }
Suppose that now we are interested in matching any string
that starts with "foo"
. Such strings can be
defined as:
"foo" followed by zero or more characters
We already know that '.'
means any character.
The metacharacter '*'
means zero or more times.
".*"
means any character zero or more times.
String regex = "foo.*"; output.println("xfoo".matches(regex)); output.println(); output.println("foo".matches(regex)); output.println("foo.".matches(regex)); output.println("foobar".matches(regex)); output.println("foofighter".matches(regex));
The above code fragment prints:
false true true true true
import java.io.PrintStream; public class Regex4 { public static void main(String[] args) { PrintStream output = System.out; String regex = "foo.*"; output.println("xfoo".matches(regex)); output.println(); output.println("foo".matches(regex)); output.println("foo.".matches(regex)); output.println("foobar".matches(regex)); output.println("foofighter".matches(regex)); } }
Suppose you have a string that represents a simple
(no hypens or multiword names) last name.
You want to know if the name starts with a letter
between 'A'-'M'
. If you allow one letter
names, then such strings are defined as:
one character in the range A-M followed by zero or more lowercase letters
The string "[A-M]"
means one character in the range A-M.
The string "[a-z]"
means one character in the range a-z.
"*"
means zero or more times.
String regex = "[A-M][a-z]*"; output.println("Newton".matches(regex)); output.println(); output.println("Gauss".matches(regex)); output.println("Bernoulli".matches(regex)); output.println("Aabbccdd".matches(regex));
The above code fragment prints:
false true true true
import java.io.PrintStream; public class Regex5 { public static void main(String[] args) { PrintStream output = System.out; String regex = "[A-M][a-z]*"; output.println("Newton".matches(regex)); output.println(); output.println("Gauss".matches(regex)); output.println("Bernoulli".matches(regex)); output.println("Aabbccdd".matches(regex)); } }
Suppose that in the previous example of matching last names you don't care about the case of the first letter (the name can start with an upper or lowercase letter). You could write the regular expression as a union of ranges.
The string "[a-m[A-M]]"
means one character
in the range "a-m" or "A-M".
String regex = "[a-m[A-M]][a-z]*"; output.println("Newton".matches(regex)); output.println(); output.println("gauss".matches(regex)); output.println("Bernoulli".matches(regex)); output.println("Aabbccdd".matches(regex));
The above code fragment prints:
false true true true
import java.io.PrintStream; public class Regex6 { public static void main(String[] args) { PrintStream output = System.out; String regex = "[a-m[A-M]][a-z]*"; output.println("Newton".matches(regex)); output.println(); output.println("gauss".matches(regex)); output.println("Bernoulli".matches(regex)); output.println("Aabbccdd".matches(regex)); } }
Suppose you wanted to check if a string was an unsigned (no + or -) whole number. Such strings could be defined as:
any digit one or more times
You could use "[0-9]"
to represent any digit,
but because matching digits is a common operation, there is
a predefined character class "\\d"
for digits.
The plus '+'
metacharacter means one or more times,
so "\\d+"
means any digit one or more times.
String regex = "\\d+"; output.println("12a".matches(regex)); output.println(); output.println("1".matches(regex)); output.println("861435".matches(regex)); output.println("000000000".matches(regex));
The above code fragment prints:
false true true true
import java.io.PrintStream; public class Regex7 { public static void main(String[] args) { PrintStream output = System.out; String regex = "\\d+"; output.println("12a".matches(regex)); output.println(); output.println("1".matches(regex)); output.println("861435".matches(regex)); output.println("000000000".matches(regex)); } }
A signed whole number has a + or - or nothing in front of the digits. Such strings could be defined as:
zero or one of [+-] followed by any digit one or more times
The metacharacter '?'
means zero or one, so the
string "[+-]?"
means zero or one character from
the set "+, -"
. The regex matching a signed whole
number is "[+-]?\\d+"
or "[+-]?[0-9]+"
.
String regex = "[+-]?\\d+"; output.println("1".matches(regex)); output.println("+861435".matches(regex)); output.println("-400".matches(regex));
The above code fragment prints:
true true true
import java.io.PrintStream; public class Regex8 { public static void main(String[] args) { PrintStream output = System.out; String regex = "[+-]?\\d+"; output.println("1".matches(regex)); output.println("+861435".matches(regex)); output.println("-400".matches(regex)); } }
"a+a+" | two or more a 's |
|
"^a" | any character except a (not a ) |
|
"[^0-9]" or "^\\d" | any character except a digit (not a digit) | |
"[a-mq-z]" or "[a-z&&[^n-p]" | a through z but not n , o nor p |
|
".{3,}" | at least 3 characters | |
".{3,5}" | at least 3 but no more than 5 characters | |
look it up | email addresses |