我的数据集格式如下:
product/productId: B000179R3I
product/title: Austin Reed Dartmouth Jacket
product/price: unknown
review/userId: A3Q0VJTUO4EZ56
review/profileName: Jeanmarie Kabala "JP Kabala"
review/helpfulness: 7/7
review/score: 4.0
review/time: 1182816000
review/summary: Periwinkle Dartmouth Blazer
review/text: I own the Austin....[whatever]...
---and above repetitions----
我尝试应用正则表达式来创建所有列(str_extract("\\s.*")
),但每次都包含第一个空格。
一种方法是提取[space]XXXXXXX
,然后删除空格。
但在Python中有没有更好的方法呢?
答案 0 :(得分:0)
在python中,你可以这样做,
>>> for line in s.split('\n'):
print(line.split(' ',1)[1])
B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...
>>>
OR
>>> for line in s.split('\n'):
print(re.search(r'(?<=\s).*', line).group())
B000179R3I
Austin Reed Dartmouth Jacket
unknown
A3Q0VJTUO4EZ56
Jeanmarie Kabala "JP Kabala"
7/7
4.0
1182816000
Periwinkle Dartmouth Blazer
I own the Austin....[whatever]...
或强>
for line in re.findall(r' (.*)', s):
print(line)