使用列表过滤Pandas数据帧

时间:2018-01-12 06:56:43

标签: python pandas pandas-groupby

我正在尝试使用user_id和掩码列表进行过滤。以下是两个user_id的输入:

data = np.array([['user_id','comment','label'],
            [100,'First comment',0],
            [101,'Buy viagra',1],
            [100,'Buy viagra two',1],
            [101,'Third comment',0],
            [100,'Third comment two',0],
            [101,'Buy drugs',1],
            [100,'Buy drugs two',1],
            [101,'Buy icecream',1],
            [100,'Buy icecream two',1],
            [101,'Buy something',1],
            [100,'Buy something two',1]])

所需的输出是:

0      100      First comment     0
1      101         Buy viagra     1
2      100     Buy viagra two     1
3      101      Third comment     0
4      100  Third comment two     0
5      101          Buy drugs     1
6      100      Buy drugs two     1
7      101       Buy icecream     1
8      100   Buy icecream two     1

通过传递user_id列表,我得到的输出不正确。

m = df.user_id.isin([100,101]) & df.label.eq('1')
i = df[m].head(3)
j = df[~m]
df = pd.concat([i, j]).sort_index()
print (df)

但是,如果我只传递一个user_id,我得到正确的输出。你能告诉我出了什么问题吗?感谢。

m = df.user_id.eq('101') & df.label.eq('1')

1 个答案:

答案 0 :(得分:4)

您的值是user_id列中的字符串存在问题,因此需要['100','101']代替[100, 101]

df = pd.DataFrame(data[1:], columns=data[0])

m = df.user_id.isin(['100','101']) & df.label.eq('1')
i = df[m]
print (i)
   user_id            comment label
1      101         Buy viagra     1
2      100     Buy viagra two     1
5      101          Buy drugs     1
6      100      Buy drugs two     1
7      101       Buy icecream     1
8      100   Buy icecream two     1
9      101      Buy something     1
10     100  Buy something two     1

您可以通过以下方式检查一列中的type

print (df.user_id.apply(type))

0     <class 'str'>
1     <class 'str'>
2     <class 'str'>
3     <class 'str'>
4     <class 'str'>
5     <class 'str'>
6     <class 'str'>
7     <class 'str'>
8     <class 'str'>
9     <class 'str'>
10    <class 'str'>
Name: user_id, dtype: object

如果需要检查所有列:

print (df.applymap(type))

          user_id        comment          label
0   <class 'str'>  <class 'str'>  <class 'str'>
1   <class 'str'>  <class 'str'>  <class 'str'>
2   <class 'str'>  <class 'str'>  <class 'str'>
3   <class 'str'>  <class 'str'>  <class 'str'>
4   <class 'str'>  <class 'str'>  <class 'str'>
5   <class 'str'>  <class 'str'>  <class 'str'>
6   <class 'str'>  <class 'str'>  <class 'str'>
7   <class 'str'>  <class 'str'>  <class 'str'>
8   <class 'str'>  <class 'str'>  <class 'str'>
9   <class 'str'>  <class 'str'>  <class 'str'>
10  <class 'str'>  <class 'str'>  <class 'str'>