c# - 用分隔符和文本限定符分割的正则表达式

时间:2013-11-13 17:55:16

标签: c# regex

我需要拆分一个文本文件,其值以逗号分隔,文本限定符如¨|¨

我试图使用这些功能:

    public string[] Split(string expression, string delimiter, 
                string qualifier, bool ignoreCase)
    {

        string _Statement = String.Format
            ("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
                            Regex.Escape(delimiter), Regex.Escape(qualifier));

        RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
        if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;

        Regex _Expression = new Regex(_Statement, _Options);
        return _Expression.Split(expression);
    } 

处理包含如下行的文本文件:

¨|¨列1¨|¨|¨|列2¨|¨|¨列3¨|¨|¨列4¨|¨

但我的正则表达式无效...... 有什么想法可以帮助我完成这项工作吗?

提前致谢

3 个答案:

答案 0 :(得分:0)

您可以在没有正则表达式的情况下执行此操作,只需将字符串除以¨|¨,然后将每个项目按空格分隔以获取单个键/值,例如

foreach (var item in str.Split(new[] { "¨|¨" }, StringSplitOptions.RemoveEmptyEntries))
{
    var tokens = item.Split(' ');
    Console.WriteLine(tokens[0]);
    Console.WriteLine(tokens[1]);
}

答案 1 :(得分:0)

不确定为什么你需要Regex这样的东西,string.Split可以为你提供所需的输出:

string str = "¨|¨column 1¨|¨,¨|¨column 2¨|¨,¨|¨column 3¨|¨,¨|¨column 4¨|¨";
string[] splitArray = str.Split(new[] { "¨|¨,", "¨|¨" }
                                        , StringSplitOptions.RemoveEmptyEntries);

输出:

foreach (var item in splitArray)
{
    Console.WriteLine(item);
}

输出:

column 1
column 2
column 3
column 4

答案 2 :(得分:0)

在.net中,我们可以做到这一点! :)

我只是推动它并感觉分享。

这是一个非常完整的正则表达式解决方案,用于拆分分隔文件行:

    private bool RowMe(string strColumnDelimiter, string strTextQualifier, string strInput, out string[] strSplitOutput, out string strResultMessage)
    {


        string[] retVal = null;
        bool blnResult = false;
        strResultMessage = "";


        //---- We need to escape at least some of the most common
        //              special characters for both delimiter & qualifier ----

        switch (strColumnDelimiter) {

            case "|":
                strColumnDelimiter = "\\|";
                break;

            case "\\":
                strColumnDelimiter = "\\\\";
                break;

        }

        switch (strTextQualifier)
        {

            case "\"":
                strTextQualifier = "\\\"";
                break;

        }


        //---- Let's have our delimited row splitter regex! ----
                                string strPattern = String.Concat(
                                            "^"

                                                ,"(?:"

                                                    ,"("
                                                        , "[^\\S" + strColumnDelimiter + strTextQualifier + "]*"        // allow leading whitespace, not counting our delimiter & qualifier

                                                        ,"(?:"                                                      
                                                            ,"(?:[^" + strColumnDelimiter + strTextQualifier +"]*)"         // any amount of characters not colum-delimiter or text-qualifier
                                                            ,"|"
                                                            , "(?:" + strTextQualifier + "(?:(?:[^" + strTextQualifier + "])|(?:" + strTextQualifier + strTextQualifier + "))*" + strTextQualifier + ")"        // any amount of characters not text-qualifier OR doubled-text-qualifier inside leading & trailing text-qualifier (allow even colum-delimiter inside text qualifier) 
                                                            ,"|"
                                                            ,"(?:(?:[^" + strColumnDelimiter + strTextQualifier + "]{1})(?:[^" + strColumnDelimiter + "]*)(?:[^" + strColumnDelimiter + strTextQualifier + "]{1}))"     // any amount of characters not column-delimiter inside other leading & trailing characters not column-delimiter or text-qualifier (allow text-qualifier inside value if it is not leading or trailing)
                                                        ,")"

                                                        , "[^\\S" + strColumnDelimiter + strTextQualifier + "]*"        // allow trailing whitespace, not counting our delimiter & qualifier
                                                    ,")"

                                                , "){0,1}"

                                                            //-- note how this second section is almost the same as the first but with a leading delimiter...  
                                                            //                  the first column must not have a leading delimiter, and any subsequent ones must
                                                , "(?:"
                                                    ,"(?:"
                                                        , strColumnDelimiter        // << :)
                                                        ,"(?:"

                                                            , "("
                                                                , "[^\\S" + strColumnDelimiter + strTextQualifier + "]*"        // allow leading whitespace, not counting our delimiter & qualifier

                                                                , "(?:"
                                                                    , "(?:[^" + strColumnDelimiter + strTextQualifier + "]*)"           // any amount of characters not colum-delimiter or text-qualifier
                                                                    , "|"
                                                                    , "(?:" + strTextQualifier + "(?:(?:[^" + strTextQualifier + "])|(?:" + strTextQualifier + strTextQualifier + "))*" + strTextQualifier + ")"        // any amount of characters not text-qualifier OR doubled-text-qualifier inside leading & trailing text-qualifier (allow even colum-delimiter inside text qualifier) 
                                                                    , "|"
                                                                    , "(?:(?:[^" + strColumnDelimiter + strTextQualifier + "]{1})(?:[^" + strColumnDelimiter + "]*)(?:[^" + strColumnDelimiter + strTextQualifier + "]{1}))"        // any amount of characters not column-delimiter inside other leading & trailing characters not column-delimiter or text-qualifier (allow text-qualifier inside value if it is not leading or trailing)
                                                                , ")"

                                                                , "[^\\S" + strColumnDelimiter + strTextQualifier + "]*"        // allow trailing whitespace, not counting our delimiter & qualifier
                                                            , ")"

                                                        ,")"
                                                    ,")"
                                                , "){0,}"

                                            ,"$"
                                        );
                                        );


        //---- And do the regex Match-ing ! ----
        System.Text.RegularExpressions.Regex objRegex = new System.Text.RegularExpressions.Regex(strPattern);
        System.Text.RegularExpressions.MatchCollection objMyMatches = objRegex.Matches(strInput);

        //---- So what did we get? ----
        if (objMyMatches.Count != 1) {
            blnResult = false;
            strResultMessage = "--NO-- no overall match";
        }
        else if (objMyMatches[0].Groups.Count != 3) {
            blnResult = false;
            strResultMessage = "--NO-- pattern not correct";
            throw new ApplicationException("ERROR SPLITTING FLAT FILE ROW!  The hardcoded regular expression appears to be broken.  This should not happen!!!  What's up??");
        }
        else {

            int cnt = (1 + objMyMatches[0].Groups[2].Captures.Count);

            retVal = new string[cnt];

            retVal[0] = objMyMatches[0].Groups[1].Captures[0].Value;

            for (int i = 0; i < objMyMatches[0].Groups[2].Captures.Count; i++) {
                retVal[i+1] = objMyMatches[0].Groups[2].Captures[i].Value;
            }

            blnResult = true;
            strResultMessage = "SUCCESS";
        }


        strSplitOutput = retVal;

        return blnResult;

    }