Question

我的数据集格式如下：

product/productId: B000179R3I
product/title: Austin Reed Dartmouth Jacket
product/price: unknown
review/userId: A3Q0VJTUO4EZ56
review/profileName: Jeanmarie Kabala "JP Kabala"
review/helpfulness: 7/7
review/score: 4.0
review/time: 1182816000
review/summary: Periwinkle Dartmouth Blazer
review/text: I own the Austin....[whatever]...
---and above repetitions----

我尝试应用正则表达式来创建所有列（str_extract("\\s.*")），但每次都包含第一个空格。

一种方法是提取[space]XXXXXXX，然后删除空格。

但在Python中有没有更好的方法呢？

Answer 1

在python中，你可以这样做，

>>> for line in s.split('\n'):
        print(line.split(' ',1)[1])


B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...
>>>

OR

>>> for line in s.split('\n'):
        print(re.search(r'(?<=\s).*', line).group())


B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...

或

for line in re.findall(r' (.*)', s): print(line)

此数据集的正则表达式

1 个答案: