我有一个输入文件,其头部如下所示:
AdditionalCookout.create!([
{day_id: 275, cookout_id: 71, description: "Sample text, that, is ,driving , me, crazy"},
{day_id: 275, cookout_id: 87, description: nil},
{day_id: 276, cookout_id: 71, description: nil},
{day_id: 276, cookout_id: 87, description: nil},
{day_id: 277, cookout_id: 92, description: nil},
{day_id: 277, cookout_id: 71, description: nil},
我正在尝试将每一行解析为它自己的对象。但是,我不能用逗号分割,因为有些描述中有逗号。
从我能找到的StackOverflow帖子中尝试了这两个正则表达式行:
re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', content[x])
和
[y.strip() for y in content[x].split(''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''')]
然而..他们都输出
['{day_id: 275', 'cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio."},']
Turns into:
day_id: 275
cookout_id: 71, description: "Feeling ambitious? If you really want to exhaust yourself today, consider adding some additional stationary cardio.",
我有什么想法可以解决这个问题,所以它正确地将每一行分成三个独立的部分,而不仅仅是两个部分?感谢
答案 0 :(得分:2)
尝试使用PyYAML来解析它。在你的榜样上为我工作。 https://pypi.python.org/pypi/PyYAML。那你可以避免正则表达式的头痛。
import yaml
yaml.load('{day_id: 275, cookout_id: 71, description: "Sample text, that, is,driving , me, crazy"}')
{'cookout_id': 71,
'day_id': 275,
'description': 'Sample text, that, is,driving , me, crazy'}