逗号分隔,但引号内没有逗号?

时间:2017-05-11 20:52:44

标签: python regex python-2.7 parsing split

我有一个输入文件,其头部如下所示:

AdditionalCookout.create!([
  {day_id: 275, cookout_id: 71, description: "Sample text, that, is ,driving , me, crazy"},
  {day_id: 275, cookout_id: 87, description: nil},
  {day_id: 276, cookout_id: 71, description: nil},
  {day_id: 276, cookout_id: 87, description: nil},
  {day_id: 277, cookout_id: 92, description: nil},
  {day_id: 277, cookout_id: 71, description: nil},

我正在尝试将每一行解析为它自己的对象。但是,我不能用逗号分割,因为有些描述中有逗号。

从我能找到的StackOverflow帖子中尝试了这两个正则表达式行:

re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', content[x])

[y.strip() for y in content[x].split(''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''')]

然而..他们都输出

['{day_id: 275', 'cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio."},']

Turns into:
day_id: 275
cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio.",

我有什么想法可以解决这个问题,所以它正确地将每一行分成三个独立的部分,而不仅仅是两个部分?感谢

1 个答案:

答案 0 :(得分:2)

尝试使用PyYAML来解析它。在你的榜样上为我工作。 https://pypi.python.org/pypi/PyYAML。那你可以避免正则表达式的头痛。

import yaml
yaml.load('{day_id: 275, cookout_id: 71, description: "Sample text, that, is,driving , me, crazy"}')
{'cookout_id': 71,
 'day_id': 275,
 'description': 'Sample text, that, is,driving , me, crazy'}