Question

我的数据框中有两列，“主题”和“描述”。我正在尝试通过拆分“主题”列中文本上的数据来清理“描述”列，因为该数据包含在“描述”的所有行中。

以下是“主题”列的摘录：

Subject
1     Question about the program   
2  Technical issue with the site

以及“描述”列：

Description \
1  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Question about the program Hello Hello I was wondering if there 
is going to be a product review coming up soon?

2  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Technical issue with the site Reviews I received emails stating 
that I need to rewrite two of my reviews

例如，在第1行上，我希望在“描述”列的第一行中拆分“关于程序的问题”，并且仅捕获该字符串之后的文本。

我尝试过 df['Description'] = df.apply(lambda x: x['Description'].split(x['Subject'], 1), axis=1)['Description'] 但没有运气，并且在描述中未包含标题的索引上出现错误“ TypeError：（'必须为str或None，不浮动'）”。我该如何处理不包含该确切文本的行，同时仍然拆分那些包含该文本的行？

任何帮助将不胜感激。谢谢。

我也尝试了建议的响应，但收到此错误。 IndexError: ('list index out of range', 'occurred at index 1')

Answer 1

您需要将df['Description']中的字符串拆分为Subject中的特定值，并在拆分后取下一部分。

df.apply(lambda x: x['Description'].split(x['Subject'])[1], axis=1)

输出：

0     Hello Hello I was wondering if there is going...
1     Reviews I received emails stating that I need...

在Pandas数据框中基于另一列的文本拆分一列文本

1 个答案: