此数据集的正则表达式

时间:2015-03-09 07:25:14

标签: python regex

我的数据集格式如下:

product/productId: B000179R3I
product/title: Austin Reed Dartmouth Jacket
product/price: unknown
review/userId: A3Q0VJTUO4EZ56
review/profileName: Jeanmarie Kabala "JP Kabala"
review/helpfulness: 7/7
review/score: 4.0
review/time: 1182816000
review/summary: Periwinkle Dartmouth Blazer
review/text: I own the Austin....[whatever]...
---and above repetitions----

我尝试应用正则表达式来创建所有列(str_extract("\\s.*")),但每次都包含第一个空格。

一种方法是提取[space]XXXXXXX,然后删除空格。

但在Python中有没有更好的方法呢?

1 个答案:

答案 0 :(得分:0)

在python中,你可以这样做,

>>> for line in s.split('\n'):
        print(line.split(' ',1)[1])


B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...
>>> 

OR

>>> for line in s.split('\n'):
        print(re.search(r'(?<=\s).*', line).group())


B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...

for line in re.findall(r' (.*)', s):
    print(line)