Grouping Constructs
Grouping Constructs 約有十種作法
一、Matched subexpressions
syntax:(subexpression)
二、Named matched subexpressions
syntax:(?<name>subexpression) or (?'name'subexpression)
三、Balancing group definitions
syntax:(?<name1-name2>subexpression) or (?'name1-name2' subexpression)
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"(?<myTemp>"")[^""]+(?<myText-myTemp>"")";
string str = @"<input id=""aaa"" name=""bbb"" type=""ccc"" />";
MatchCollection matchCollection = Regex.Matches(str, pattern);
foreach (Match m in matchCollection)
{
for (int i = 1; i < m.Groups.Count; i++)
{
Console.WriteLine(m.Groups[i].Name + ":" + m.Groups[i].Value);
}
Console.WriteLine();
}
}
}
}
其結果為
說明:
如果前面已定義過 myTemp,那麼這裡可以擷取出 myTemp 到 myText 中間的文字,
然後將它命名為 myText。
可參考以下例子做對照。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"""[^""]+""";
string str = @"<input id=""aaa"" name=""bbb"" type=""ccc"" />";
MatchCollection matchCollection = Regex.Matches(str, pattern);
foreach (Match m in matchCollection)
{
Console.WriteLine(m.Value);
}
}
}
}
其結果為
四、Noncapturing Groups
syntax:(?:subexpression)
五、Group Options
syntax:(?imnsx-imnsx:subexpression)
Group Options可用的列表如下
RegexOptions member | Inline character | Effect |
IgnoreCase | i | Use case-insensitive matching. |
Multiline | m | Use multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string). |
Singleline | s | Use single-line mode, where the period (.) matches every character (instead of every character except \n). |
ExplicitCapture | n | Do not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression). |
IgnorePatternWhitespace | x | Exclude unescaped white space from the pattern, and enable comments after a number sign (#). |
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"(?i:Dog) Dodgers";
string pattern2 = @"(?i:Dog) dodgers";
string str = @"dog Dodgers god";
MatchCollection matchCollection = Regex.Matches(str, pattern);
foreach (Match m in matchCollection)
{
Console.WriteLine(m.Value);
}
}
}
}
其結果為
說明:
1、此語法不會截取任何群組。
2、在範圍內的 Group Options 才有效。
3、(?i:Dog) 或 (?-i:Dog) 或 (?m-i:Dog) 或 (?mx-i:Dog) 或 (?m-ix:Dog) 或 (?mx-is:Dog)
都是合法的 Group Options pattern。
六、Zero-Width Positive Lookahead Assertions
syntax:(?=subexpression)
七、Zero-Width Negative Lookahead Assertions
syntax:(?!subexpression)
八、Zero-Width Positive Lookbehind Assertions
syntax:(?<=subexpression)
九、Zero-Width Negative Lookbehind Assertions
syntax:(?<!subexpression)
十、Nonbacktracking Subexpressions (Atomic Grouping)
syntax:(?>subexpression)
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"(\w)\1+(\w\b)";
string str = @"001 2223 4444 55556";
MatchCollection matchCollection = Regex.Matches(str, pattern);
foreach (Match m in matchCollection)
{
Console.WriteLine(m.Value);
}
}
}
}
其結果為
說明:
「4444」會被檢測出來是因為 regex engine 會盡可能地試出可符合的套法,
所以該套法為「4444」->「(\w)\1+(\w\b)」。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"(?>(\w)\1+)(\w\b)";
string str = @"001 2223 4444 55556";
MatchCollection matchCollection = Regex.Matches(str, pattern);
foreach (Match m in matchCollection)
{
Console.WriteLine(m.Value);
}
}
}
}
其結果為
說明:
使用 Nonbacktracking Subexpressions 則不會回溯找尋試出可符合的套法,
所以當「4444」->「(\w)\1+(\w\b)」時,失敗就失敗了,
並不會盡可能地試出可符合的套法。
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"a(bc|b)c";
string str = @"abcc-abc";
string result = Regex.Replace(str, pattern, "*");
Console.WriteLine(result);
}
}
}
其結果為
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string pattern = @"a(?>bc|b)c";
string str = @"abcc-abc";
string result = Regex.Replace(str, pattern, "*");
Console.WriteLine(result);
}
}
}
其結果為
說明:
上例可以比對出「abcc」,比對法為「abcc」->「a(?>bc|b)c」,
為何比對不出「abc」?因為使用了 Nonbacktracking Subexpressions 之後,
regex engine 並不會積極地找尋所有可能的套法,當一次比對失敗就失敗了,
「abc」->「a(?>bc|b)c」,並不會嘗試「b」的套法。