Regular Expression in Java
Quick Study of Regex:
Common Syntax to be remember:
Metacharacter:
e.g: use \d instead of [0-9]
Quantifiers:
It defines how often an element can occurs.
Grouping and Back reference:
Using (), one can group regular expressions. One can retrieve group values via $ i.e. one can refer to a group $1 is the first group, $2 the second, etc.
Lets for example assume you want to replace all whitespace between a letter followed by a point (dot) or a comma.
package com.Nur;
public class Testing {
public static final String EXAMPLE_TEST = "This is my small example , full . nochange."
+ "string which I'm going to " + "use for pattern matching.";
public static void main(String[] args) {
String pattern = "(\\w)(\\s+)([\\.,])";
System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$3"));
}
}
Output: This is my small exampl, ful. nochange.string which I'm going to use for pattern matching.
Common Syntax to be remember:
Regular Expression | Description |
---|---|
.
|
Matches any sign |
^regex
|
regex must match at the beginning of the line |
regex$
|
Finds regex must match at the end of the line |
[abc]
|
Set definition, can match the letter a or b or c |
[abc][vz]
|
Set definition, can match a or b or c followed by either v or z |
[^abc]
|
When a "^" appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c |
[a-d1-7]
|
Ranges, letter between a and d and figures from 1 to 7, will not match d1 |
X|Z
|
Finds X or Z |
XZ
|
Finds X directly followed by Z |
$
|
Checks if a line end follows |
Metacharacter:
e.g: use \d instead of [0-9]
Regular Expression | Description |
---|---|
\d
|
Any digit, short for [0-9] |
\D
|
A non-digit, short for [^0-9] |
\s
|
A whitespace character, short for [ \t\n\x0b\r\f] |
\S
|
A non-whitespace character, for short for [^\s] |
\w
|
A word character, short for [a-zA-Z_0-9] |
\W
|
A non-word character [^\w] |
\S+
|
Several non-whitespace characters |
Quantifiers:
It defines how often an element can occurs.
Regular Expression | Description | Examples |
---|---|---|
*
|
Occurs zero or more times, is short for {0,} | X* - Finds no or several letter X, .* - any character sequence |
+
|
Occurs one or more times, is short for {1,} | X+ - Finds one or several letter X |
?
|
Occurs no or one times, ? is short for {0,1} | X? -Finds no or exactly one letter X |
{X}
|
Occurs X number of times, {} describes the order of the preceding liberal | \d{3} - Three digits, .{10} - any character sequence of length 10 |
{X,Y}
|
Occurs between X and Y times, | \d{1,4}- \d must occur at least once and at a maximum of four |
*?
|
? after a qualifier makes it a "reluctant quantifier", it tries to find the smallest match. |
Grouping and Back reference:
Using (), one can group regular expressions. One can retrieve group values via $ i.e. one can refer to a group $1 is the first group, $2 the second, etc.
Lets for example assume you want to replace all whitespace between a letter followed by a point (dot) or a comma.
package com.Nur;
public class Testing {
public static final String EXAMPLE_TEST = "This is my small example , full . nochange."
+ "string which I'm going to " + "use for pattern matching.";
public static void main(String[] args) {
String pattern = "(\\w)(\\s+)([\\.,])";
System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$3"));
}
}
Output: This is my small exampl, ful. nochange.string which I'm going to use for pattern matching.
Comments