Question

所以，我有两个数据框。第一个数据框是 dataset ，由几列组成，我将在此数据框中使用的是 dataset ['text_msg'] ，此列包含文本数据。

第二个数据框 sentences_to_exclude 包含类型为文本类型的数据。

我将在此数据框中使用的列是 sentences_to_exclude ['sentences'] 。

我需要做的是验证第一个数据框中是否有 sentences_to_exclude ['sentences'] 中的句子，然后删除整个句子。
我尝试了一个函数，但对我不起作用：这是我使用过的函数==>

  def remove_words(data):
    words_to_remove = sentences_to_exclude['sentences'].lower().split(" ")
    text_body = dataset['text_msg']
    for word in words_to_remove:
        text_body = text_body.replace(word,'' )
    return text_body

以下是 sentences_to_exclude ['sentences']

的示例

按需求倒入最佳需求，信息信息具有重要意义

第一个数据帧是数据集['text_msg'] 的示例：

倒入最佳特征事件，从头到尾查看信息：-代码事务：-事实/命令客户：-执行和发送消息的消息（附件）描述detaillee de votre需求

希望我的要求明确谢谢您提前的帮助

示例数据

sentences = ['code transaction', 'Pour un traitement efficace']
text = [ ' i should delete code transaction ', ' i am trying to delete Pour un traitement efficace only from this sentence ' ]

df1 = pd.DataFrame({'Sentences ': sentences })
df2 = pd.DataFrame({'Text': text})

Answer 1

仍然无法正确理解您的问题，我会尽力帮助您，但是下次您必须提供示例数据时。

为回答您的问题，我将提供示例数据集并说明如何从其他文本中删除单词或句子：

# This is our example data
sentences = ['code transaction', 'Pour un traitement efficace']
text = [ ' i should delete code transaction ', ' i am trying to delete Pour un traitement efficace only from this sentence ' ]

df1 = pd.DataFrame({'Sentences': sentences})
df2 = pd.DataFrame({'Text': text})

# df1

    Sentences
0   code transaction
1   Pour un traitement efficace

# df2
    Text
0   i should delete code transaction
1   i am trying to delete Pour un traitement effi...

接下来，我们要统一数据，这样就不会出现不匹配的情况，因此我们将转换为大写：

df1['Sentences'] = df1.Sentences.str.upper()
df2['Text'] = df2.Text.str.upper()


    Sentences
0   CODE TRANSACTION
1   POUR UN TRAITEMENT EFFICACE


    Text
0   I SHOULD DELETE CODE TRANSACTION
1   I AM TRYING TO DELETE POUR UN TRAITEMENT EFFI...

现在我们的数据格式正确，我们可以将文本从一个数据集中删除到另一个

df2['Text_cleaned'] = df2.Text.str.replace('|'.join(df1.Words), '')


    Text                                                Text_cleaned
0   I SHOULD DELETE CODE TRANSACTION                    I SHOULD DELETE
1   I AM TRYING TO DELETE POUR UN TRAITEMENT EFFI...    I AM TRYING TO DELETE ONLY FROM THIS SENTENCE

'|'.join(df1.Sentences)的作用是什么？
它返回由|

分隔的字符串

'|'.join(df1.Words)

'CODE TRANSACTION|POUR UN TRAITEMENT EFFICACE'

希望这可以帮助您并回答您的问题。
现在，您可以将此逻辑应用于自己的数据。

从数据框列中删除一个字符串短语并替换为python

1 个答案: