Question

我正在尝试按另一个DataFrame的值过滤一个DataFrame，但无法使其工作，因为DataFrame过滤器的大小与要过滤的DataFrame的大小不同。我认为我需要使用set_index以某种方式对齐两个DataFrame，但这可能是错误的。

import pandas as pd
df1 = pd.DataFrame({'a': [1, 1, 2, 3, 3, 4], 'b': [5, 3, 6, 2, 6, 4]})
df2 = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [3, 5, 6, 3]})
dfa = df1.set_index('a')
>>> dfa
   b
a   
1  5
1  3
2  6
3  2
3  6
4  4

dfb = df2.set_index('a')

>>> dfa[dfa['b'] <= dfb['b']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.py", line 699, in wrapper
    raise ValueError('Series lengths must match to compare')
ValueError: Series lengths must match to compare

预期的DataFrame为pd.DataFrame({'a': [1, 3, 3], 'b': [3, 2, 6]})：

（所有<a, b>行从df1消失，b中的df2值为＆lt; = b df1值a并且df1值与df2和>>> df1[(df1['a'] == df2['a']) & (df1['b'] <= df2['b'])] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/ops.py", line 699, in wrapper raise ValueError('Series lengths must match to compare') ValueError: Series lengths must match to compare}匹配。

更新

更天真的方式也不起作用......

Dashboard

Answer 1

您可以使用reindex_like将您的第二个数据框设置为df1大小，然后使用isin方法添加您的尝试，而不是将df1['a']与df2['a']进行比较}：

df3 = df2.reindex_like(df1)

In [93]: df1[(df3['a'].isin(df1['a'])) & (df1['b'] <= df3['b'])]
Out[93]:
   a  b
1  1  3
2  2  6
3  3  2

Answer 2

这是一种方式：

>>> df1[df1.b <= df1.a.map(dfb.b)]
   a  b
1  1  3
3  3  2
4  3  6

使用df1比dfa更容易，因为您需要map，这对索引无效（仅限在系列上）。如果您绝对需要使用dfa而不是dfb，那么您必须将比较的第二部分更改为dfa.reset_index().a.map(dfb.b)。

Answer 3

我认为最简单的方法是使用merge并将df2列添加到df1。

>>> df2['c'] = df2['b']
>>> pd.merge(df1, df2, how='left', on=['a'])
   a  b_x  b_y  c
0  1    5    3  3
1  1    3    3  3
2  2    6    5  5
3  3    2    6  6
4  3    6    6  6
5  4    4    3  3

然后只做df1[df1['b_x'] <= df1['c']]。

比较不同长度的DataFrame

3 个答案: