Question

我在pandas和python中有这个名字：

    text1       text2
0   sunny       This is a sunny day
1   rainy day   No this day is a rainy day

我想将其转换为此：

    text1       text2
0   sunny       This is a day
1   rainy day   No this day is a

因此，我想基于同一行的text2从text1删除一些文本。

我这样做了：

df = df.apply(lambda x: x['text2'].str.replace(x['text1'], ''))

但是我遇到一个错误：

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 0')

可能与此有关： https://stackoverflow.com/a/53986135/9024698。

做我想做的最有效的方法是什么？

Answer 1

最麻烦的解决方法是replace-但如果需要用另一列替换每行，则可能存在多个空格：

df['text2'] = df.apply(lambda x: x['text2'].replace(x['text1'], ''), axis=1)
print (df)
       text1              text2
0      sunny     This is a  day
1  rainy day  No this day is a

同时拆分两列的解决方案：

df['text2'] = df.apply(lambda x: ' '.join(y for y in x['text2'].split() 
                                          if y not in set(x['text1'].split())), axis=1)

如果需要用另一列的所有值替换更好，请使用@Erfan的解决方案：

df['text2'].str.replace('|'.join(df['text1']), '')

Answer 2

只需使用replace方法：

df["text2"]=df["text2"].replace(to_replace=df["text1"],value="",regex=True)

编辑：

如@jezrael所述，此方法未考虑环绕空间（因为正则表达式不匹配它们）。但是，您可以调整正则表达式以避免它们中的一些向模式添加可选空格，例如：

df["text2"]=df["text2"].replace(to_replace=df["text1"]+" *",value="",regex=True)

Answer 3

这是因为您将函数应用于列而不是行。另外，x['text2']已经是字符串，因此无需调用.str。经过这些修改，您将拥有：

print(df.apply(lambda x: x['text2'].replace(x['text1'], ''), axis=1))
# 0       This is a  day
# 1    No this day is a

如您所见，您只返回text2列。

以下是返回整个数据帧已处理的示例：

# Import module
import pandas as pd

df = pd.DataFrame({"text1": ["sunny", "rainy day"],
                   "text2": ["This is a sunny day", "No this day is a rainy day"]})
print(df)
#        text1                       text2
# 0      sunny         This is a sunny day
# 1  rainy day  No this day is a rainy day

# Function to apply
def remove_word(row):
    row['text2'] = row.text2.replace(row['text1'], '')
    return row

# Apply the function on each row (axis = 1)
df = df.apply(remove_word, axis=1)
print(df)
#        text1              text2
# 0      sunny     This is a  day
# 1  rainy day  No this day is a

根据另一列的字符串删除该列的字符串

3 个答案: