用逗号分隔字符串,忽略引号中的任何标点符号(包括',')

时间:2014-08-24 12:06:38

标签: c# regex string split comma

如何用逗号分隔字符串(来自文本框),不包括双引号中的字符串(而不删除引号),以及其他可能的标点符号标记(例如'。'';'' - ')?

E.g。如果有人在文本框中输入以下内容:

apple, orange, "baboons, cows", rainbow, "unicorns, gummy bears"

如何将上述字符串拆分为以下字符串(例如,放入列表中)?

apple

orange

"baboons, cows"

rainbow

"Unicorns, gummy bears..."

感谢您的帮助!

6 个答案:

答案 0 :(得分:4)

您可以尝试以下使用正向前瞻的正则表达式

string value = @"apple, orange, ""baboons, cows"", rainbow, ""unicorns, gummy bears""";
string[] lines = Regex.Split(value, @", (?=(?:""[^""]*?(?: [^""]*)*))|, (?=[^"",]+(?:,|$))");

foreach (string line in lines) {
Console.WriteLine(line);
}

<强>输出:

apple
orange
"baboons, cows"
rainbow
"unicorns, gummy bears"

IDEONE

答案 1 :(得分:1)

试试这个:

Regex str = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

foreach (Match m in str.Matches(input))
{
    Console.WriteLine(m.Value.TrimStart(','));
}

您也可以尝试查看FileHelpers

答案 2 :(得分:1)

就像CSV解析器,而不是正则表达式,你可以循环遍历每个字符,如下所示:

public List<string> ItemStringToList(string inputString)
{  
    var itemList   = new List<string>();
    var currentIem = "";
    var quotesOpen = false;

    for (int i = 0; i < inputString.Length; i++)
    {
        if (inputString[i] == '"')
        {
            quotesOpen = !quotesOpen;
            continue;
        }

        if (inputString[i] == ',' && !quotesOpen)
        {
            itemList.Add(currentIem);
            currentIem = "";
            continue;
        }

        if (currentIem == "" && inputString[i] == ' ') continue;
        currentIem += inputString[i];
    }

    if (currentIem != "") itemList.Add(currentIem);

    return itemList;
}

测试用法示例:

var test1 = ItemStringToList("one, two, three");
var test2 = ItemStringToList("one, \"two\", three");
var test3 = ItemStringToList("one, \"two, three\"");
var test4 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test5 = ItemStringToList("one, \"two, three\", four, \"five six\", seven");
var test6 = ItemStringToList("one, \"two, three\", four, \"five six, seven\"");
var test7 = ItemStringToList("\"one, two, three\", four, \"five six, seven\"");

如果想要更快的字符连接,可以将其更改为使用StringBuilder。

答案 3 :(得分:0)

尝试使用它可以在很多方面使用分割数组字符串,如果你想用空格分割,只需在('')中放一个空格。

  namespace LINQExperiment1
  {
  class Program
  {
  static void Main(string[] args)
  {
   string[] sentence = new string[] { "apple", "orange", "baboons  cows", " rainbow", "unicorns  gummy bears" };

  Console.WriteLine("option 1:"); Console.WriteLine("————-");
  // option 1: Select returns three string[]’s with
  // three strings in each.
  IEnumerable<string[]> words1 =
  sentence.Select(w => w.Split(' '));
  // to get each word, we have to use two foreach loops
  foreach (string[] segment in words1)
  foreach (string word in segment)
  Console.WriteLine(word);
  Console.WriteLine();
  Console.WriteLine("option 2:"); Console.WriteLine("————-");
  // option 2: SelectMany returns nine strings
  // (sub-iterates the Select result)
  IEnumerable<string> words2 =
  sentence.SelectMany(segment => segment.Split(','));
  // with SelectMany we have every string individually
  foreach (var word in words2)
  Console.WriteLine(word);
  // option 3: identical to Opt 2 above written using
  // the Query Expression syntax (multiple froms)
  IEnumerable<string> words3 =from segment in sentence
  from word in segment.Split(' ')
  select word;
   }
  }
 }

答案 4 :(得分:0)

这比我想象的要复杂,我认为这是一个很好的实际问题。

以下是我为此提出的解决方案。关于我的解决方案,我不喜欢的一件事是必须添加双引号,另一个是变量的名称:p:

internal class Program
{
    private static void Main(string[] args)
    {

        string searchString =
            @"apple, orange, ""baboons, cows. dogs- hounds"", rainbow, ""unicorns, gummy bears"", abc, defghj";

        char delimeter = ',';
        char excludeSplittingWithin = '"';

        string[] splittedByExcludeSplittingWithin = searchString.Split(excludeSplittingWithin);

        List<string> splittedSearchString = new List<string>();

        for (int i = 0; i < splittedByExcludeSplittingWithin.Length; i++)
        {
            if (i == 0 || splittedByExcludeSplittingWithin[i].StartsWith(delimeter.ToString()))
            {
                string[] splitttedByDelimeter = splittedByExcludeSplittingWithin[i].Split(delimeter);
                for (int j = 0; j < splitttedByDelimeter.Length; j++)
                {
                    splittedSearchString.Add(splitttedByDelimeter[j].Trim());
                }
            }
            else
            {
                splittedSearchString.Add(excludeSplittingWithin + splittedByExcludeSplittingWithin[i] +
                                         excludeSplittingWithin);
            }
        }

        foreach (string s in splittedSearchString)
        {
            if (s.Trim() != string.Empty)
            {
                Console.WriteLine(s);
            }
        }
        Console.ReadKey();
    }
}

答案 5 :(得分:0)

另一个正则表达式解决方案:

private static IEnumerable<string> Parse(string input)
{
  // if used frequently, should be instantiated with Compiled option
  Regex regex = new Regex(@"(?<=^|,\s)(\""(?:[^\""]|\""\"")*\""|[^,\s]*)");

  return regex.Matches(inputData).Where(m => m.Success);
}