我需要拆分一个文本文件,其值以逗号分隔,文本限定符如¨|¨
我试图使用这些功能:
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
string _Statement = String.Format
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
Regex.Escape(delimiter), Regex.Escape(qualifier));
RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;
Regex _Expression = new Regex(_Statement, _Options);
return _Expression.Split(expression);
}
处理包含如下行的文本文件:
¨|¨列1¨|¨|¨|列2¨|¨|¨列3¨|¨|¨列4¨|¨
但我的正则表达式无效...... 有什么想法可以帮助我完成这项工作吗?
提前致谢
答案 0 :(得分:0)
您可以在没有正则表达式的情况下执行此操作,只需将字符串除以¨|¨
,然后将每个项目按空格分隔以获取单个键/值,例如
foreach (var item in str.Split(new[] { "¨|¨" }, StringSplitOptions.RemoveEmptyEntries))
{
var tokens = item.Split(' ');
Console.WriteLine(tokens[0]);
Console.WriteLine(tokens[1]);
}
答案 1 :(得分:0)
不确定为什么你需要Regex这样的东西,string.Split
可以为你提供所需的输出:
string str = "¨|¨column 1¨|¨,¨|¨column 2¨|¨,¨|¨column 3¨|¨,¨|¨column 4¨|¨";
string[] splitArray = str.Split(new[] { "¨|¨,", "¨|¨" }
, StringSplitOptions.RemoveEmptyEntries);
输出:
foreach (var item in splitArray)
{
Console.WriteLine(item);
}
输出:
column 1
column 2
column 3
column 4
答案 2 :(得分:0)
在.net中,我们可以做到这一点! :)
我只是推动它并感觉分享。
这是一个非常完整的正则表达式解决方案,用于拆分分隔文件行:
private bool RowMe(string strColumnDelimiter, string strTextQualifier, string strInput, out string[] strSplitOutput, out string strResultMessage)
{
string[] retVal = null;
bool blnResult = false;
strResultMessage = "";
//---- We need to escape at least some of the most common
// special characters for both delimiter & qualifier ----
switch (strColumnDelimiter) {
case "|":
strColumnDelimiter = "\\|";
break;
case "\\":
strColumnDelimiter = "\\\\";
break;
}
switch (strTextQualifier)
{
case "\"":
strTextQualifier = "\\\"";
break;
}
//---- Let's have our delimited row splitter regex! ----
string strPattern = String.Concat(
"^"
,"(?:"
,"("
, "[^\\S" + strColumnDelimiter + strTextQualifier + "]*" // allow leading whitespace, not counting our delimiter & qualifier
,"(?:"
,"(?:[^" + strColumnDelimiter + strTextQualifier +"]*)" // any amount of characters not colum-delimiter or text-qualifier
,"|"
, "(?:" + strTextQualifier + "(?:(?:[^" + strTextQualifier + "])|(?:" + strTextQualifier + strTextQualifier + "))*" + strTextQualifier + ")" // any amount of characters not text-qualifier OR doubled-text-qualifier inside leading & trailing text-qualifier (allow even colum-delimiter inside text qualifier)
,"|"
,"(?:(?:[^" + strColumnDelimiter + strTextQualifier + "]{1})(?:[^" + strColumnDelimiter + "]*)(?:[^" + strColumnDelimiter + strTextQualifier + "]{1}))" // any amount of characters not column-delimiter inside other leading & trailing characters not column-delimiter or text-qualifier (allow text-qualifier inside value if it is not leading or trailing)
,")"
, "[^\\S" + strColumnDelimiter + strTextQualifier + "]*" // allow trailing whitespace, not counting our delimiter & qualifier
,")"
, "){0,1}"
//-- note how this second section is almost the same as the first but with a leading delimiter...
// the first column must not have a leading delimiter, and any subsequent ones must
, "(?:"
,"(?:"
, strColumnDelimiter // << :)
,"(?:"
, "("
, "[^\\S" + strColumnDelimiter + strTextQualifier + "]*" // allow leading whitespace, not counting our delimiter & qualifier
, "(?:"
, "(?:[^" + strColumnDelimiter + strTextQualifier + "]*)" // any amount of characters not colum-delimiter or text-qualifier
, "|"
, "(?:" + strTextQualifier + "(?:(?:[^" + strTextQualifier + "])|(?:" + strTextQualifier + strTextQualifier + "))*" + strTextQualifier + ")" // any amount of characters not text-qualifier OR doubled-text-qualifier inside leading & trailing text-qualifier (allow even colum-delimiter inside text qualifier)
, "|"
, "(?:(?:[^" + strColumnDelimiter + strTextQualifier + "]{1})(?:[^" + strColumnDelimiter + "]*)(?:[^" + strColumnDelimiter + strTextQualifier + "]{1}))" // any amount of characters not column-delimiter inside other leading & trailing characters not column-delimiter or text-qualifier (allow text-qualifier inside value if it is not leading or trailing)
, ")"
, "[^\\S" + strColumnDelimiter + strTextQualifier + "]*" // allow trailing whitespace, not counting our delimiter & qualifier
, ")"
,")"
,")"
, "){0,}"
,"$"
);
);
//---- And do the regex Match-ing ! ----
System.Text.RegularExpressions.Regex objRegex = new System.Text.RegularExpressions.Regex(strPattern);
System.Text.RegularExpressions.MatchCollection objMyMatches = objRegex.Matches(strInput);
//---- So what did we get? ----
if (objMyMatches.Count != 1) {
blnResult = false;
strResultMessage = "--NO-- no overall match";
}
else if (objMyMatches[0].Groups.Count != 3) {
blnResult = false;
strResultMessage = "--NO-- pattern not correct";
throw new ApplicationException("ERROR SPLITTING FLAT FILE ROW! The hardcoded regular expression appears to be broken. This should not happen!!! What's up??");
}
else {
int cnt = (1 + objMyMatches[0].Groups[2].Captures.Count);
retVal = new string[cnt];
retVal[0] = objMyMatches[0].Groups[1].Captures[0].Value;
for (int i = 0; i < objMyMatches[0].Groups[2].Captures.Count; i++) {
retVal[i+1] = objMyMatches[0].Groups[2].Captures[i].Value;
}
blnResult = true;
strResultMessage = "SUCCESS";
}
strSplitOutput = retVal;
return blnResult;
}