我正在尝试选择满足特定条件的pandas数据帧的子部分 - 在这种情况下,某列的每个元素都是外部列表的一部分。我很惊讶地发现这不起作用,因为.loc的其他条件语句非常简单。我怎样才能做到这一点?
MWE:
import pandas as pd
import numpy as np
test_dict = {'first': [0,1,0,0,1,0], 'second': [1,2,3,4,5,6]}
test_df = pd.DataFrame(test_dict)
arr1 = [-1,-4,2,-9,8,7,-5,5,-8,0]
arr2 = [2,5]
new_df1 = test_df.loc[test_df.second in arr1]
new_df2 = test_df.loc[test_df.second in arr2]
print new_df1
print new_df2
答案 0 :(得分:2)
Series.isin()您要找的是什么?
In [55]: new_df1 = test_df.loc[test_df.second.isin(arr1)]
In [56]: new_df2 = test_df.loc[test_df.second.isin(arr2)]
In [57]: new_df1
Out[57]:
first second
1 1 2
4 1 5
In [58]: new_df2
Out[58]:
first second
1 1 2
4 1 5
你也可以像样式一样使用SQL - DataFrame.query():
In [60]: test_df.query("second in @arr1")
Out[60]:
first second
1 1 2
4 1 5