我正在尝试从数据框中提取所有内容,直到出现特定单词为止。我试图提取全部内容,直到出现以下单词:
高,中,低
数据框中文本的示例视图:
text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x
预期输出:
text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months
答案 0 :(得分:2)
IIUC,您需要replace
和regex
我们的想法是匹配列表中的所有单词,然后替换列表中的所有单词
我们使用.*
来匹配任何东西,直到字符串结尾。
words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'
df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)
print(df['text_new'])
0 Ticket creation dropped in last 24 hours
1 Calls dropped in last 3 months
Name: text, dtype: object