我的数据框是:
name type
0 apple red fruit with red peel that is edible
1 orange thick peel that is bitter and used dried sometimes
我想从每一行中提取peel
之后的所有文本,并创建一个单独的列
name type peel
0 apple red fruit with red peel that is edible that is edible
1 orange thick peel is bitter and used dried is bitter and used dried
我正在尝试:
def get_peel(desc):
text = desc.split(' ')
for i,t in enumerate(text):
if t.lower() == 'peel':
return text[i:]
return 'not found'
df['peel'] = df['type'].apply(get_peel)
但是我得到的结果是:
0 not found
1 not found
我在做什么错了?
答案 0 :(得分:1)
将str.extract
与正则表达式一起使用。
例如:
df = pd.DataFrame({"name": ['apple', 'orange'], 'type': ['red fruit with red peel that is edible', 'thick peel that is bitter and used dried sometimes']})
df['peel'] = df['type'].str.extract(r"(?<=\bpeel\b)(.*)$")
print(df['peel'])
输出:
0 that is edible
1 that is bitter and used dried sometimes
Name: peel, dtype: object
答案 1 :(得分:1)
请您尝试以下。
df
创建:
df = pd.DataFrame({'name':['apple','orange'],
'type':['red fruit with red peel that is edible','thick peel that is bitter and used dried sometimes']})
添加新列的代码:
df['peel']=df['type'].replace(regex=True,to_replace=r'.*peel(.*)',value=r'\1')