Question

我对python和pandas很新。我正在尝试使用基于另一列（用户列）中的部分字符串的值向数据框（组列）添加新列。用户编码如下：AA1，AA2，BB1，BB2等。我想要的是组列为所有AA用户都有一个'AA'值。在寻找方法之后，我提出了以下几行：

df['group'] = ['AA' if x x.startswith('AA') else 'other' for x in df['user']]

嗯，它不起作用： 1）我得到无效的语法和行太长的错误 2）但是，如果我为x =='AA1'更改x.startswith（'AA'），它确实有效，那么它是startwith部分的东西吗？ 3）如果x x.starts与（'BB'）在同一行中，我不知道如何添加'BB'，或者我应该为每个用户类别写一行？非常感谢你

Answer 1

我认为您可以numpy.where或str.startswith使用str.contains：

import pandas as pd
import numpy as np

df = pd.DataFrame({'user':['AA1','AA2','BB1','BB2']})
print (df)
  user
0  AA1
1  AA2
2  BB1
3  BB2

df['group'] = np.where(df.user.str.startswith('AA'), 'AA', 'other')
df['group1'] = np.where(df.user.str.contains('AA'), 'AA', 'other')
#if need extract first 2 chars from each user
df['g1'] = df.user.str[:2]
print (df)
  user  group group1  g1
0  AA1     AA     AA  AA
1  AA2     AA     AA  AA
2  BB1  other  other  BB
3  BB2  other  other  BB

对于提取子字符串检查indexing with str。

Answer 2

df['group'] = ['AA' if x.startswith('AA') else 'other' for x in df['user']]

您在x

之前只需额外x.startswith('AA')

如何根据另一个列部分字符串在panda上添加列

2 个答案: