Java Regular Expressions
What is a Regular Expression?
A Regular Expression (regex or regexp) is a powerful sequence of characters that defines a search pattern. It is used for matching and manipulating strings, and it provides a concise and flexible way to describe text patterns. Regular expressions are widely used in programming, text processing, and data validation tasks.
Java does not have a built-in Regular Expression class, but we can import the java.util.regex
package to work with regular expressions. The package includes the following classes:
Pattern
Class - Defines a pattern (to be used in a search)Matcher
Class - Used to search for the patternPatternSyntaxException
Class - Indicates syntax error in a regular expression pattern
Example
import java.util.regex.Matcher; import java.util.regex.Pattern; public class EmailValidator { public static void main( String[] args) { // Example email addresses String email1 = "user@example.com"; String email2 = "invalid.email"; String email3 = "another_user@domain"; // Regular expression for a simple email validation String regex = "^[a-zA-Z0-9_]+@[a-zA-Z0-9]+\\.[a-zA-Z]{2,}$"; // Compile the regular expression Pattern pattern = Pattern.compile( regex); // Create Matcher objects Matcher matcher1 = pattern.matcher(email1); Matcher matcher2 = pattern.matcher(email2); Matcher matcher3 = pattern.matcher(email3); // Perform matching and print results System.out.println( "Email 1 is valid: " + matcher1.matches()); System.out.println( "Email 2 is valid: " + matcher2.matches()); System.out.println( "Email 3 is valid: " + matcher3.matches()); } }
Regular Expression Syntax
Here is the table listing down all the regular expression metacharacter syntax available in Java −
Subexpression | Matches |
---|---|
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character except newline. Using m option allows it to match the newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets. |
\A | Beginning of the entire string. |
\z | End of the entire string. |
\Z | End of the entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of the preceding expression. |
re+ | Matches 1 or more of the previous thing. | re? | Matches 0 or 1 occurrence of the preceding expression. |
re{ n} | Matches exactly n number of occurrences of the preceding expression. |
re{ n,} | Matches n or more occurrences of the preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of the preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. |
(?: re) | Groups regular expressions without remembering the matched text. |
(?> re) | Matches the independent pattern without backtracking. |
\w | Matches the word characters. |
\W | Matches the nonword characters. |
\s | Matches the whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches the nonwhitespace. |
\d | Matches the digits. Equivalent to [0-9]. |
\D | Matches the nondigits. |
\A | Matches the beginning of the string. |
\Z | Matches the end of the string. If a newline exists, it matches just before newline. |
\z | Matches the end of the string. |
\G | Matches the point where the last match finished. |
\n | Back-reference to capture group number "n". |