如何从熊猫中提取匹配模式后的所有文本?

时间:2019-11-25 11:02:46

标签: python-3.x pandas

我的数据框是:

     name     type
0    apple    red fruit with red peel that is edible
1    orange   thick peel that is bitter and used dried sometimes

我想从每一行中提取peel之后的所有文本,并创建一个单独的列

     name     type                                              peel
0    apple    red fruit with red peel that is edible            that is edible
1    orange   thick peel is bitter and used dried               is bitter and used dried

我正在尝试:

def get_peel(desc):
    text = desc.split(' ')
    for i,t in enumerate(text):
        if t.lower() == 'peel':
            return text[i:]
    return 'not found'

df['peel'] = df['type'].apply(get_peel)

但是我得到的结果是:

0         not found
1         not found

我在做什么错了?

2 个答案:

答案 0 :(得分:1)

str.extract与正则表达式一起使用。

例如:

df = pd.DataFrame({"name": ['apple', 'orange'], 'type': ['red fruit with red peel that is edible', 'thick peel that is bitter and used dried sometimes']})
df['peel'] = df['type'].str.extract(r"(?<=\bpeel\b)(.*)$")
print(df['peel'])

输出:

0                              that is edible
1     that is bitter and used dried sometimes
Name: peel, dtype: object

答案 1 :(得分:1)

请您尝试以下。

df创建:

df = pd.DataFrame({'name':['apple','orange'],
                   'type':['red fruit with red peel that is edible','thick peel that is bitter and used dried sometimes']})

添加新列的代码:

df['peel']=df['type'].replace(regex=True,to_replace=r'.*peel(.*)',value=r'\1')