熊猫-根据某些关键字提取所有内容

时间:2020-09-17 10:10:45

标签: pandas string

我正在尝试从数据框中提取所有内容,直到出现特定单词为止。我试图提取全部内容,直到出现以下单词:

高,中,低

数据框中文本的示例视图:

text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x

预期输出:

text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months

1 个答案:

答案 0 :(得分:2)

IIUC,您需要replaceregex

我们的想法是匹配列表中的所有单词,然后替换列表中的所有单词

我们使用.*来匹配任何东西,直到字符串结尾。

words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'

df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)


print(df['text_new'])

0    Ticket creation dropped in last 24 hours 
1              Calls dropped in last 3 months 
Name: text, dtype: object