这个问题是我previous one
的第2部分例如,我有这样的DF:
df = pd.DataFrame({
'A': [[e for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)],
'B': [[e*10 for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)],
'C': [[e*100 for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)]
})
A B C
0 [1, 2, 3] [10, 20, 30] [100, 200, 300]
1 [4, 5, 6] [40, 50, 60] [400, 500, 600]
2 [7, 8, 9] [70, 80, 90] [700, 800, 900]
3 [10, 11, 12] [100, 110, 120] [1000, 1100, 1200]
4 [13, 14, 15] [130, 140, 150] [1300, 1400, 1500]
我需要获得'A'
包含10行的行
现在我正在使用:
f = lambda x: 10 in x
mask = df['A'].apply(f)
df[mask]
我的问题是:
答案 0 :(得分:1)
构建多索引框架要好得多。这要快得多 因为这些是基础数据的本机类型(提示:在你的框架上做df.dtypes,它们将是对象)
In [3]: A = pd.DataFrame([[e for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)])
In [4]: B = pd.DataFrame([[e*10 for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)])
In [5]: C = pd.DataFrame([[e*100 for e in xrange(x+1, x+4)] for x in xrange(0, 15, 3)])
# this creates a 2-level hierarchy
In [9]: df = pd.concat([A,B,C],keys=['A','B','C'],axis=1)
Out[8]:
A B C
0 1 2 0 1 2 0 1 2
0 1 2 3 10 20 30 100 200 300
1 4 5 6 40 50 60 400 500 600
2 7 8 9 70 80 90 700 800 900
3 10 11 12 100 110 120 1000 1100 1200
4 13 14 15 130 140 150 1300 1400 1500
# select out A
In [14]: df['A']
Out[14]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
# this is a boolean array
In [11]: df['A']>10
Out[11]:
0 1 2
0 False False False
1 False False False
2 False False False
3 False True True
4 True True True
选择特定切片
In [26]: df.ix[:,('A',1)]
Out[26]:
0 2
1 5
2 8
3 11
4 14
Name: (A, 1), dtype: int64