如何让csv阅读器忽略大括号内的逗号(卷曲,方形和角度)?

时间:2015-03-31 18:17:04

标签: python csv

我有逗号分隔的值,在嵌套大括号中包含逗号。具体来说,我将输入逗号分隔的C ++ 11对象。

例如,这是一个输入:

std::vector<int>{32, 45, 10}, std::array<std::string, 5>{"a", "bc", "def", "ghij", "whoa, this is, a toughie"}, 8, "foo, bar", {"initializer-list?", "no problem!", "(hopefully...)"}

这是我想要的输出:

[
    'std::vector<int>{32, 45, 10}',
    'std::array<std::string, 5>{"a", "bc", "def", "ghij", "whoa, this is, a toughie"}',
    '8',
    'foo, bar',
    '{"initializer-list?", "no problem!", "(hopefully...)"}'
]

但是python的csv给了我:

[
    'std::vector<int>{32',
    '45',
    '10}',
    'std::array<std::string',
    '5>{"a"',
    '"bc"',
    '"def"',
    '"ghij"',
    '"whoa',
    'this is',
    'a toughie"}',
    '8',
    'foo, bar', # at least this one works :/
    '{"initializer-list?"',
    '"no problem!"',
    '"(hopefully...)"}'
]

如何自定义csv模块来处理这些情况?

2 个答案:

答案 0 :(得分:1)

你可以使用正则表达式来分割每一行,然后稍后清理它

import re
a = r'std::vector<int>{32, 45, 10}, std::array<std::string, 5>{"a", "bc", "def", "ghij", "whoa, this is, a toughie"}, 8, "foo, bar", {"initializer-list?", "no     problem!", "(hopefully...)"}'

# split on occurrences of "}, s"
results = re.split('},\s+s', a)

注意:拆分会从每个字符串的末尾删除}(除了最后一个字符串),并且会从除第一个字符串之外的每个字符串中删除s

编辑:

想要解决这个问题,并想出了以下内容(假设字符串中没有来自集合{,},",<,>的单个字符)。您可以通过更具体地预览cpp声明来删除<,>个案例。

a = r'std::vector<int>{32, 45, 10}, std::array<std::string, 5>{"a", "bc", "def", "ghij", "whoa, this is, a toughie"}, 8, "foo, bar", {"initializer-list?", "no problem!", "(hopefully...)"}'

l_braces = {"{", "<"}
r_braces = {"}", ">"}

def split(s):
  brace_count = 0
  quote_count = 0
  breaks = []

  for i, c in enumerate(s):

    if c == '"':
      quote_count += 1
      if quote_count % 2 == 1:
        brace_count += 1
      else:
        brace_count -= 1

    if (c in l_braces):
      brace_count += 1

    if (c in r_braces):
      brace_count -= 1

    if (c == ",") and (brace_count == 0):
      breaks.append(i)

  pieces = []

  lag = 0
  for b in breaks:
    pieces.append(s[lag:b].strip())
    lag = b+1

  pieces.append(s[breaks[-1]+1:].strip())
  return pieces

print(split(a))

print(split(a))将打印以下内容......

['std::vector<int>{32, 45, 10}',
 'std::array<std::string, 5>{"a", "bc", "def", "ghij", "whoa, this is, a toughie"}',
 '8',
 '"foo, bar"',
 '{"initializer-list?", "no problem!", "(hopefully...)"}']

答案 1 :(得分:0)

您的CSV模块只有在找到逗号时才会分隔值。它并不关心其他符号。

要按照您想要的方式实现拼写,您必须以检测开口括号的方式扩展模块逻辑,例如&#34; {&#34;。 当找到左括号时,应忽略所有逗号符号,直到找到结束括号。

这样你就可以获得所需的输出。