Question

我正在尝试在下面的df中创建现有变量（col1）的子集。我的新变量（col2）只有＆＃34; a＆＃34;对应于＆＃34; a＆＃34;在col1。其余值应标记为＆＃34;其他＆＃34;。请帮忙。

COL1
一个
b
ç
一个
b
ç
一个

col2的
一个
其他
其他
一个
其他
其他

Answer 1

使用numpy.where：

df['col2'] = np.where(df['col1'] == 'a', 'a', 'Other')
#alternative
#df['col2'] = df['col1'].where(df['col1'] == 'a', 'Other')
print (df)
  col1   col2
0    a      a
1    b  Other
2    c  Other
3    a      a
4    b  Other
5    c  Other
6    a      a

Answer 2

方法1 ：np.where

这是最直接的方法：

df['col2'] = np.where(df['col1'] == 'a', 'a', 'Other')

方法2 ：pd.DataFrame.loc

df['col2'] = 'Other'
df.loc[df['col1'] == 'a', 'col2'] = 'a'

方法3 ：pd.Series.map

df['col2'] = df['col1'].map({'a': 'a'}).fillna('Other')

通过numpy提取df['col1'].values数组表示，可以优化大多数这些方法。

Answer 3

没有任何额外的库，因为问题没有用pandas标记，也没有numpy：

您可以将列表理解与if和else一起使用：

col1 = ['a', 'b', 'c', 'a', 'b', 'c', 'a']
col2 = [ x if x=='a' else 'others' for x in col1 ]

根据其他列

3 个答案: