我正在尝试使用user_id和掩码列表进行过滤。以下是两个user_id的输入:
data = np.array([['user_id','comment','label'],
[100,'First comment',0],
[101,'Buy viagra',1],
[100,'Buy viagra two',1],
[101,'Third comment',0],
[100,'Third comment two',0],
[101,'Buy drugs',1],
[100,'Buy drugs two',1],
[101,'Buy icecream',1],
[100,'Buy icecream two',1],
[101,'Buy something',1],
[100,'Buy something two',1]])
所需的输出是:
0 100 First comment 0
1 101 Buy viagra 1
2 100 Buy viagra two 1
3 101 Third comment 0
4 100 Third comment two 0
5 101 Buy drugs 1
6 100 Buy drugs two 1
7 101 Buy icecream 1
8 100 Buy icecream two 1
通过传递user_id列表,我得到的输出不正确。
m = df.user_id.isin([100,101]) & df.label.eq('1')
i = df[m].head(3)
j = df[~m]
df = pd.concat([i, j]).sort_index()
print (df)
但是,如果我只传递一个user_id,我得到正确的输出。你能告诉我出了什么问题吗?感谢。
m = df.user_id.eq('101') & df.label.eq('1')
答案 0 :(得分:4)
您的值是user_id
列中的字符串存在问题,因此需要['100','101']
代替[100, 101]
:
df = pd.DataFrame(data[1:], columns=data[0])
m = df.user_id.isin(['100','101']) & df.label.eq('1')
i = df[m]
print (i)
user_id comment label
1 101 Buy viagra 1
2 100 Buy viagra two 1
5 101 Buy drugs 1
6 100 Buy drugs two 1
7 101 Buy icecream 1
8 100 Buy icecream two 1
9 101 Buy something 1
10 100 Buy something two 1
您可以通过以下方式检查一列中的type
:
print (df.user_id.apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
6 <class 'str'>
7 <class 'str'>
8 <class 'str'>
9 <class 'str'>
10 <class 'str'>
Name: user_id, dtype: object
如果需要检查所有列:
print (df.applymap(type))
user_id comment label
0 <class 'str'> <class 'str'> <class 'str'>
1 <class 'str'> <class 'str'> <class 'str'>
2 <class 'str'> <class 'str'> <class 'str'>
3 <class 'str'> <class 'str'> <class 'str'>
4 <class 'str'> <class 'str'> <class 'str'>
5 <class 'str'> <class 'str'> <class 'str'>
6 <class 'str'> <class 'str'> <class 'str'>
7 <class 'str'> <class 'str'> <class 'str'>
8 <class 'str'> <class 'str'> <class 'str'>
9 <class 'str'> <class 'str'> <class 'str'>
10 <class 'str'> <class 'str'> <class 'str'>