Regular Expression Options
inline options 有五種
| Option | Description | RegexOptions |
| i | Use case-insensitive matching. | IgnoreCase |
| m | Use multiline mode. ^ and $ match the beginning and end of a line, instead of the beginning and end of a string. | Multiline |
| n | Do not capture unnamed groups. | ExplicitCapture |
| s | Use single-line mode. | Singleline |
| x | Ignore unescaped white space in the regular expression pattern. | IgnorePatternWhitespace |
RegexOptions 是一個列舉,其有十種狀態

一、None
其實就是 regular expression engine 的預設設定,
而 regular expression engine 的預設設定為
not ECMAScript、LeftToRight、not IgnoreCase、
not IgnorePatternWhitespace、not CultureInvariant、not ExplicitCapture、
「.」被識為任一字元,除了「跳行」之外。
二、IgnoreCase
原本 regular expression engine 是 case-sensitive,
設定 RegexOptions.IgnoreCase 可讓 regular expression engine 不區分大小寫。
三、Multiline
使用 Multiline 前
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"^1{1}";
string str = "123\n123\u000a123";
string result = Regex.Replace(str, pattern, "*");
Console.WriteLine(result);
}
}
}
其結果為

使用 Multiline 後
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
RegexOptions regexOptions = RegexOptions.Multiline;
string pattern = @"^1{1}";
string str = "123\n123\u000a123";
string result = Regex.Replace(str, pattern, "*", regexOptions);
Console.WriteLine(result);
}
}
}
其結果為

說明:
1、當設成 RegexOptions.Multiline 時,「^」與「$」的意思將由一字串的開頭與結尾,
轉變為每行字串的開頭與結尾。
2、RegexOptions.Multiline 與 RegexOptions.Singleline 不是互斥關係。
四、ExplicitCapture
有設定 RegexOptions.ExplicitCapture 時,regular engine 將只取得有做「命名群組」的值,
非「命名群組」將不會被截取。以下示範其差別。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = "abc-123,def-456,";
string pattern = @"([a-z]{3})-(?<MyName>\d{3}),";
MatchCollection matchCollection = Regex.Matches(input, pattern);
foreach (Match match in matchCollection)
{
for (int groupsCount = 1; groupsCount < match.Groups.Count; groupsCount++)
{
Console.WriteLine(" Group {0}: {1}", groupsCount, match.Groups[groupsCount].Value);
int captureCount = 0;
foreach (Capture capture in match.Groups[groupsCount].Captures)
{
Console.WriteLine(" Capture {0}: {1}", captureCount, capture.Value);
captureCount++;
}
}
Console.WriteLine();
}
}
}
}
其結果為

using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = "abc-123,def-456,";
string pattern = @"([a-z]{3})-(?<MyName>\d{3}),";
MatchCollection matchCollection = Regex.Matches(input, pattern, RegexOptions.ExplicitCapture);
foreach (Match match in matchCollection)
{
for (int groupsCount = 1; groupsCount < match.Groups.Count; groupsCount++)
{
Console.WriteLine(" Group {0}: {1}", groupsCount, match.Groups[groupsCount].Value);
int captureCount = 0;
foreach (Capture capture in match.Groups[groupsCount].Captures)
{
Console.WriteLine(" Capture {0}: {1}", captureCount, capture.Value);
captureCount++;
}
}
Console.WriteLine();
}
}
}
}
其結果為

五、Compiled
在 .Net,Regular expression 預設是「直譯式」,
如果你的程式需要被執行多次,有效能考量時,
請設成「編譯式」(Compiled)。
六、Singleline
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @".";
string str = "1\n1\u000a1";
string result = Regex.Replace(str, pattern, "*");
Console.WriteLine(result);
}
}
}
其結果為

說明:
當沒有設定 RegexOptions.Singleline 時,「.」的意思是任一字元,除了「跳行」之外。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
RegexOptions regexOptions = RegexOptions.Singleline;
string pattern = @".";
string str = "1\n1\u000a1";
string result = Regex.Replace(str, pattern, "*", regexOptions);
Console.WriteLine(result);
}
}
}
其結果為

說明:
當有設定 RegexOptions.Singleline 時,「.」的意思是任一字元,沒有例外。
額外一例
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
RegexOptions regexOptions = RegexOptions.Singleline;
string pattern = @".";
string str = Environment.NewLine;
string result = Regex.Replace(str, pattern, "*", regexOptions);
Console.WriteLine(result);
}
}
}
其執行結果為

說明:
Environment.NewLine 也是「跳行」,當有設定 RegexOptions.Singleline 時,
其執行應該也只會跑出一顆星,為何是兩顆星呢?
原因在於,Environment.NewLine 等於「\r」+「\n」,自然會被識為兩字元囉。
七、IgnorePatternWhitespace
1、「 」、「\s」兩者皆表示為空白,但設定了 RegexOptions.IgnorePatternWhitespace之後,
regular expression engine 只會認得「\s」這 pattern 了。
2、當設定了 RegexOptions.IgnorePatternWhitespace之後,
一個 pattern 的「#」字號後面皆為註解。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = "I'm comment";
string pattern = "[\\s]#I'm comment";
string result = Regex.Replace(input, pattern, "*", RegexOptions.IgnorePatternWhitespace);
Console.WriteLine(result);
}
}
}
其結果為

3、就算是設定了 RegexOptions.IgnorePatternWhitespace 「 」也不能視而不見
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = " ";
string pattern = "[ ]";
string result = Regex.Replace(input, pattern, "*", RegexOptions.IgnorePatternWhitespace);
Console.WriteLine(result);
}
}
}
其結果為
![]()
4、通常會設定 RegexOptions.IgnorePatternWhitespace 是為了 pattern 的可讀性。
八、RightToLeft
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string input = "bot tob big ";
string pattern = @"\bb\w+\s";
Match m = Regex.Match(input, pattern, RegexOptions.RightToLeft);
while (m.Success)
{
Console.WriteLine(m.Value);
m = m.NextMatch();
}
}
}
}
其結果為

說明:
當設定了 RegexOptions.RightToLeft 時,表示比對是從字串的右邊比到左邊,
而不是連 pattern 也要從右一個個單元比到左邊。
九、ECMAScript
RegexOptions.ECMAScript 可以和
RegexOptions.IgnoreCase 與 RegexOptions.Multiline 一起設定,
而其餘的 RegexOptions 皆不得和 RegexOptions.ECMAScript 一起設定。
ECMAScript 與 canonical regular expressions 有三個不同
1、character class syntax
ECMAScript 不支援 Unicode Property,而 canonical regular expressions 支援。
2、self-referencing capturing groups
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static string pattern;
static void Main(string[] args)
{
string input = "aa aaaa aaaaaa ";
pattern = @"((a+)(\1) ?)+";
// Match input using canonical matching.
AnalyzeMatch(Regex.Match(input, pattern));
// Match input using ECMAScript.
AnalyzeMatch(Regex.Match(input, pattern, RegexOptions.ECMAScript));
}
private static void AnalyzeMatch(Match m)
{
if (m.Success)
{
Console.WriteLine("'{0}' matches {1} at position {2}.",
pattern, m.Value, m.Index);
int grpCtr = 0;
foreach (Group grp in m.Groups)
{
Console.WriteLine(" {0}: '{1}'", grpCtr, grp.Value);
grpCtr++;
int capCtr = 0;
foreach (Capture cap in grp.Captures)
{
Console.WriteLine(" {0}: '{1}'", capCtr, cap.Value);
capCtr++;
}
}
}
else
{
Console.WriteLine("No match found.");
}
Console.WriteLine();
}
// The example displays the following output:
// No match found.
//
// '((a+)(\1) ?)+' matches aa aaaa aaaaaa at position 0.
// 0: 'aa aaaa aaaaaa '
// 0: 'aa aaaa aaaaaa '
// 1: 'aaaaaa '
// 0: 'aa '
// 1: 'aaaa '
// 2: 'aaaaaa '
// 2: 'aa'
// 0: 'aa'
// 1: 'aa'
// 2: 'aa'
// 3: 'aaaa '
// 0: ''
// 1: 'aa '
// 2: 'aaaa '
}
}
3、octal versus backreference interpretation
| Regular expression | Canonical behavior | ECMAScript behavior |
| \0 followed by 0 to 2 octal digits | Interpret as an octal. For example, \044 is always interpreted as an octal value and means "$". | Same behavior. |
| \ followed by a digit from 1 to 9, followed by no additional decimal digits, | Interpret as a backreference. For example, \9 always means backreference 9, even if a ninth capturing group does not exist. If the capturing group does not exist, the regular expression parser throws an ArgumentException. | If a single decimal digit capturing group exists, backreference to that digit. Otherwise, interpret the value as a literal. |
| \ followed by a digit from 1 to 9, followed by additional decimal digits |
Interpret the digits as a decimal value. If that capturing group exists, interpret the expression as a backreference. Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. Interpret the remaining digits as literals. For example, in the expression \3000, if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0. |
Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals. |
十、CultureInvariant
各國的語言不同,有可能 regex engine 在判斷上也有所差異,
要解決其問題可去設定 RegexOptions.CultureInvariant。