Grouping Constructs

 

Grouping Constructs 約有十種作法

 

一、Matched subexpressions

syntax:(subexpression)

 

二、Named matched subexpressions

syntax:(?<name>subexpression) or (?'name'subexpression)

 

三、Balancing group definitions

syntax:(?<name1-name2>subexpression) or (?'name1-name2' subexpression)

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"(?<myTemp>"")[^""]+(?<myText-myTemp>"")";
            string str = @"<input id=""aaa"" name=""bbb"" type=""ccc"" />";
            MatchCollection matchCollection = Regex.Matches(str, pattern);
            foreach (Match m in matchCollection)
            {
                for (int i = 1; i < m.Groups.Count; i++)
                {
                    Console.WriteLine(m.Groups[i].Name + ":" + m.Groups[i].Value);
                }
                Console.WriteLine();
            }
        }
    }
}

其結果為

說明:

如果前面已定義過 myTemp,那麼這裡可以擷取出 myTemp 到 myText 中間的文字,

然後將它命名為 myText。

可參考以下例子做對照。

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"""[^""]+""";
            string str = @"<input id=""aaa"" name=""bbb"" type=""ccc"" />";
            MatchCollection matchCollection = Regex.Matches(str, pattern);
            foreach (Match m in matchCollection)
            {
                Console.WriteLine(m.Value);
            }
        }
    }
}

其結果為

 

四、Noncapturing Groups

syntax:(?:subexpression)

 

五、Group Options

syntax:(?imnsx-imnsx:subexpression)

Group Options可用的列表如下

RegexOptions member Inline character Effect
IgnoreCase i Use case-insensitive matching.
Multiline m Use multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string).
Singleline s Use single-line mode, where the period (.) matches every character (instead of every character except \n).
ExplicitCapture n Do not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression).
IgnorePatternWhitespace x Exclude unescaped white space from the pattern, and enable comments after a number sign (#).

 

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"(?i:Dog) Dodgers";
            string pattern2 = @"(?i:Dog) dodgers";
            string str = @"dog Dodgers god";
            MatchCollection matchCollection = Regex.Matches(str, pattern);
            foreach (Match m in matchCollection)
            {
                Console.WriteLine(m.Value);
            }
        }
    }
}

其結果為

說明:

1、此語法不會截取任何群組。

2、在範圍內的 Group Options 才有效。

3、(?i:Dog) 或 (?-i:Dog) 或 (?m-i:Dog) 或 (?mx-i:Dog) 或 (?m-ix:Dog) 或 (?mx-is:Dog)

都是合法的 Group Options pattern。

 

六、Zero-Width Positive Lookahead Assertions

syntax:(?=subexpression)

 

七、Zero-Width Negative Lookahead Assertions

syntax:(?!subexpression)

 

八、Zero-Width Positive Lookbehind Assertions

syntax:(?<=subexpression)

 

九、Zero-Width Negative Lookbehind Assertions

syntax:(?<!subexpression)

 

十、Nonbacktracking Subexpressions (Atomic Grouping)

syntax:(?>subexpression)

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"(\w)\1+(\w\b)";
            string str = @"001 2223 4444 55556";
            MatchCollection matchCollection = Regex.Matches(str, pattern);
            foreach (Match m in matchCollection)
            {
                Console.WriteLine(m.Value);
            }
        }
    }
}

其結果為

說明:

「4444」會被檢測出來是因為 regex engine 會盡可能地試出可符合的套法,

所以該套法為「4444」->「(\w)\1+(\w\b)」。

 

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"(?>(\w)\1+)(\w\b)";
            string str = @"001 2223 4444 55556";
            MatchCollection matchCollection = Regex.Matches(str, pattern);
            foreach (Match m in matchCollection)
            {
                Console.WriteLine(m.Value);
            }
        }
    }
}

其結果為

說明:

使用 Nonbacktracking Subexpressions 則不會回溯找尋試出可符合的套法,

所以當「4444」->「(\w)\1+(\w\b)」時,失敗就失敗了,

並不會盡可能地試出可符合的套法。

 

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"a(bc|b)c";
            string str = @"abcc-abc";
            string result = Regex.Replace(str, pattern, "*");
            Console.WriteLine(result);
        }
    }
}

其結果為

 

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = @"a(?>bc|b)c";
            string str = @"abcc-abc";
            string result = Regex.Replace(str, pattern, "*");
            Console.WriteLine(result);
        }
    }
}

其結果為

說明:

上例可以比對出「abcc」,比對法為「abcc」->「a(?>bc|b)c」,

為何比對不出「abc」?因為使用了 Nonbacktracking Subexpressions 之後,

regex engine 並不會積極地找尋所有可能的套法,當一次比對失敗就失敗了,

abc」->「a(?>bc|b)c」,並不會嘗試「b」的套法。