假设我的DataFrame
构造如下:
import pandas
import numpy
column_names = ["name", "age", "score"]
names = numpy.random.choice(["Jorge", "Xavier", "Joaquin", "Juan", "Jose"], 50)
ages = numpy.random.randint(0, 100, 50)
scores = numpy.random.rand(50)
df = pandas.DataFrame.from_dict(dict(zip(column_names, [names, ages, scores])))
以上DataFrame
的前10行如下所示。
age name score
0 15 Jorge 0.031380
1 44 Juan 0.373199
2 84 Xavier 0.999065
3 55 Juan 0.159873
4 55 Joaquin 0.211931
5 33 Juan 0.484350
6 22 Xavier 0.510276
7 86 Joaquin 0.490013
8 2 Jose 0.185086
9 51 Juan 0.979015
我希望能够选择name
列的元素是{"Xavier", "Joaquin"}
成员的行。我本能地想到像df.iloc[df["name"] in {"Xavier", "Joaquin"}, :]
这样的东西,但那并不起作用。那么我该如何实现呢?
我知道我可以通过
实现这个特定的例子df.loc[numpy.logical_or(df["name"] == "Xavier", df["name"] == "Joaquin"), :]
但这不是重点。这只是我真正问题的简化示例。我的高度为DataFrame
2,340,923,名称设置为names
,大小为3,624,我想选择名称为names
名称成员的行。
答案 0 :(得分:5)
我认为你需要isin
:
print (df.loc[df["name"].isin(["Xavier", "Joaquin"]), :])
age name score
1 66 Joaquin 0.767056
2 17 Joaquin 0.721369
7 53 Joaquin 0.209415
10 9 Xavier 0.394815
13 20 Joaquin 0.276596
14 17 Xavier 0.810725
15 76 Xavier 0.918273
17 91 Joaquin 0.974723
18 39 Xavier 0.869607
21 3 Xavier 0.200578
22 34 Joaquin 0.938018
23 90 Xavier 0.664387
26 51 Xavier 0.946753
28 49 Xavier 0.859911
30 22 Joaquin 0.602381
34 7 Xavier 0.759837
35 96 Joaquin 0.790691
39 13 Joaquin 0.599557
40 10 Xavier 0.563933
41 69 Xavier 0.983787
43 58 Xavier 0.542903
44 8 Joaquin 0.307106
45 77 Joaquin 0.330278
46 55 Joaquin 0.980077
47 12 Xavier 0.177509
49 15 Joaquin 0.590958
它也适用于set
:
names = set(["Xavier", "Joaquin"])
print (df.loc[df["name"].isin(names), :])
age name score
1 66 Joaquin 0.767056
2 17 Joaquin 0.721369
7 53 Joaquin 0.209415
10 9 Xavier 0.394815
13 20 Joaquin 0.276596
14 17 Xavier 0.810725
15 76 Xavier 0.918273
17 91 Joaquin 0.974723
18 39 Xavier 0.869607
21 3 Xavier 0.200578
22 34 Joaquin 0.938018
23 90 Xavier 0.664387
26 51 Xavier 0.946753
28 49 Xavier 0.859911
30 22 Joaquin 0.602381
34 7 Xavier 0.759837
35 96 Joaquin 0.790691
39 13 Joaquin 0.599557
40 10 Xavier 0.563933
41 69 Xavier 0.983787
43 58 Xavier 0.542903
44 8 Joaquin 0.307106
45 77 Joaquin 0.330278
46 55 Joaquin 0.980077
47 12 Xavier 0.177509
49 15 Joaquin 0.590958