Question

我已经看到了许多几乎相似的问题，但是我仍然没有找到正确的答案。

我的df有一列['Name']，其中包含各种商店的名称。我想通过在新列df ['Type']中给杂货店提供标签“超市”来对这些商品进行分类。

我首先这样做：

df['Type'] = df['Naam'].str.contains('Albert')

这给出了一个真假系列。

之后，我这样做了：

df['Type'] = df['Type'].replace({True: 'Supermarkt'})

那行得通，但不是很聪明.....在为另一家商店写了另一行str.contains之后，显然['Type']中的每个值都再次变成了Bool ....

然后我这样做了：

df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')

我的想法是，我将能够重复使用此代码，并且一遍又一遍地使用字符串的另一部分。

但是.....

df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')

出现错误：

Length of values does not match length of index。我想我明白这是什么意思，但无法弄清楚为什么第一个str.contains（）给出了完整的序列，而这个给出了错误....

所以我的问题是：有没有办法以一种方式更改df['Type'] = (df['Naam'].str.contains('Albert'), 'Supermarkt')：1：True变成'Supermarkt'并且所有False值都保留在原位或被其他东西取代？

先谢谢了。问候Jan

Answer 1

# create a selection
boolean_indexer = df['Naam'].str.contains('Albert')

# create your new column 
df.loc[boolean_indexer, 'Type'] = 'Supermarkt'