Question

我正在尝试从数据框中提取所有内容，直到出现特定单词为止。我试图提取全部内容，直到出现以下单词：

高，中，低

数据框中文本的示例视图：

text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x

预期输出：

text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months

Answer 1

IIUC，您需要replace和regex

我们的想法是匹配列表中的所有单词，然后替换列表中的所有单词

我们使用.*来匹配任何东西，直到字符串结尾。

words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'

df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)


print(df['text_new'])

0    Ticket creation dropped in last 24 hours 
1              Calls dropped in last 3 months 
Name: text, dtype: object

熊猫-根据某些关键字提取所有内容

1 个答案: