比较一列中的值是否在另一列python pandas中的两个值之间

时间:2016-07-07 03:31:52

标签: python pandas

我有两个数据框如下:

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})

B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})

我想知道A中的“值”是否在B中的“高”和“低”之间,如果是,我想将列名从B复制到A.

输出应如下所示:

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})

我的函数使用iterrows如下:

def func1(row):
    x = row['value']
    for index,value in B.iterrows():
        if ((value['low'] <= x) &(x<=value['high'])):
            return value['name']

但它还没有达到我想做的目的,

谢谢,

2 个答案:

答案 0 :(得分:4)

您可以使用列表推导来遍历A中的值,然后使用loc来获取相关的映射值。 le小于或等于,ge大于或等于。

例如,第一行中为v = 3。使用简单的布尔索引:

>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
   high  low name
0     5    1  one

假设DataFrame B没有任何重叠范围,那么您将返回上面的一行。然后,使用loc获取name列,如下所示。因为每个返回的名称都是一个系列,所以您需要获取第一个和唯一的标量值(例如,使用iat)。

A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0] 
             for v in A['value']]

>>> A
   value  name
0      3   one
1      7   two
2      5   one
3     18  four
4     23  five
5     27   six
6     21  five
7     29   six

答案 1 :(得分:1)

我相信你正在寻找这样的东西:

    {
  "error": {
    "message": "There was a problem uploading your video file. Please try again.",
    "type": "OAuthException",
    "code": 390,
    "error_subcode": 1363030,
    "is_transient": true,
    "error_user_title": "Video Upload Time Out",
    "error_user_msg": "Your video upload timed out before it could be completed. This is probably because of a slow network connection or because the video you're trying to upload is too large. Please try again.",
    "fbtrace_id": "BjhWMaUVuR1"
  }
}

我使用In [1]: import pandas as pd In [2]: A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]}) In [3]: In [3]: B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]}) In [4]: A Out[4]: value 0 3 1 7 2 5 3 18 4 23 5 27 6 21 7 29 In [5]: B Out[5]: high low name 0 5 1 one 1 10 6 two 2 15 11 three 3 20 16 four 4 25 21 five 5 30 26 six In [6]: def func1(x): ...: for row in B.itertuples(): ...: if row.low <= x <= row.high: ...: return row.name ...: In [7]: A.value.map(func1) Out[7]: 0 one 1 two 2 one 3 four 4 five 5 six 6 five 7 six Name: value, dtype: object In [8]: A['name'] = A['value'].map(func1) In [9]: A Out[9]: value name 0 3 one 1 7 two 2 5 one 3 18 four 4 23 five 5 27 six 6 21 five 7 29 six 因为它应该快一点,但一般情况下效率不高。这是一个解决方案,但可能会有更好的解决方案。

编辑添加:

itertuples

快速而肮脏的测试表明亚历山大的方法更快。我想知道它是如何扩展的。