我有一个如下所示的Pandas数据集: dataset of words and their features
我希望将“性别”列中的“ x”替换为以下条件:如果“单词”列中包含“Mädchen”之类的单词列表,则应在“性别”列中添加“中性” ,在前一个单词的行(是数字)中。
例如,这个:
Gender Words
x 10.
x Mädchen
应成为:
Gender Words
Neutral 10.
x Mädchen
我已经像这样尝试过np.where
:
Food2_case["Gender"]= np.where(Food2_case.Words.isin(["Mädchen"]), (dropped_data.Words.str.contains('\d',regex= True) == 'A'), "x")
但是我遇到了这个错误:
ValueError:操作数不能与形状一起广播 (8000,)(275988,)()
答案 0 :(得分:0)
# Create dataset
data = pd.DataFrame([[0, 0, 0], [10, "Madchen", 5]]).T
data.columns = ["Gender", "Words"]
# Shift one column of interest (take the value of previous row)
data.loc[:, "iswordin"] = data.Words.shift(-1)
# Do what you want to do
data.loc[data.iswordin.isin(["Madchen", "Girl", "boy", "..."]), "Gender"] = "Neutral"
# Now you can drop "iswordin" column which is no longer useful