我有一个" CSV"一些数据字段碰巧包含逗号分隔符,如下面的示例数据的第二行所示。
"1","stuff","and","things"
"2","black,white","more","stuff"
我无法更改源数据,因此我不知道如何使用str.split()而不会将值拆分为"黑色,白色和#34;。
我找到了解决问题的方法:
当然这很容易克服,所以我期待着学习新东西!
非常感谢您的帮助。
答案 0 :(得分:2)
>>> import csv, StringIO
>>> data = """"1","stuff","and","things"
... "2","black,white","more","stuff"
... """
>>> reader = csv.reader(StringIO.StringIO(data))
>>> for row in reader:
... print row
...
['1', 'stuff', 'and', 'things']
['2', 'black,white', 'more', 'stuff']
答案 1 :(得分:1)
如果您的来源不是CSV,并且您只想平衡字符串中的引号,则可以尝试使用shlex模块:
import shlex
lex = shlex.shlex('"2","black,white","more","stuff"')
for i in lex:
print i
答案 2 :(得分:0)
字符串外的逗号后面跟着双引号。只需分为,"
而不只是,
(或甚至是","
)
>>> x = '"2","black,white","more","stuff"'
>>> x
'"2","black,white","more","stuff"'
>>> x.split(',"')
['"2"', 'black,white"', 'more"', 'stuff"']
>>> [y.strip('"') for y in x.split(',"')]
['2', 'black,white', 'more', 'stuff']
当然,编辑效率
YevgenYampolskiy对shlex
的建议也是另一种选择。
>>> x = '"2","black,white","more","stuff"'
>>> x
'"2","black,white","more","stuff"'
>>> import shlex
>>> y = shlex.shlex(x)
>>> [i.strip('"') for i in y if i != ',']
['2', 'black,white', 'more', 'stuff']