python中的代码使用正则表达式可以执行类似这样的操作
输入:
> https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"
理想输出:
['https://test.com', '2017-08-14', 'This is the title with , and "anything" in it', 'This is the paragraph also with , and "anything" in it']
答案 0 :(得分:0)
您可以使用多种拆分方法。
vanilla内置split方法接受分隔符作为参数,并将执行写在tin上的内容,将字符串精确地分割为您指定的任何分隔符,并将其作为列表返回。
在您的情况下,您想要的分隔符是“,”但只有逗号不在引号内。在一般情况下你可以这样做:
foo = 'https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"'
print foo.split(',')
#but this has the caveat that you don't have any ','s within your input as those will become delimitation points as well, which you do not want.
在这种特殊情况下你也可以匹配说“,”
但这也会失败,因为你的输入有一个元素title with , and "any
,而且会被错误地拆分。
在这种情况下,我们可以使用shlex
并使用它的split
方法。现在,这种拆分方法将在空白处设置分隔符。
所以,做:
print [_ for _ in shlex.split(foo)]
会给我们更接近我们想要的东西,但不完全是:
>>> ['https://test.com,', '2017-08-14,', 'This is the title with , and anything in it,', 'This is the paragraph also with , and anything in it']
可以看出,它在元素中有令人讨厌的逗号,我们不想要它。
不幸的是,我们无法做到
print [_[:-1] for _ in shlex.split(foo)]
为此会切断'it'中的最后一个't',但我们可以使用内置的字符串
rstrip
方法
并匹配每个元素末尾的任何逗号:
print [_.rstrip(',') for _ in shlex.split(foo)]
给出输出:
>>> ['https://test.com', '2017-08-14', 'This is the title with , and anything in it', 'This is the paragraph also with , and anything in it']
非常接近我们想要但不完全正确! (错过“围绕'任何' - shlex吞噬了这个!)。
但是,我们非常接近,我会为你的作业留下那个小小的花絮,你应该先尝试找到解决方案,就像其他人发布的那样。
资源:
https://www.tutorialspoint.com/python/string_split.htm
https://docs.python.org/2/library/shlex.html
P.S。提示:同样查看csv模块。