我有两个数据框如下:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
我想知道A中的“值”是否在B中的“高”和“低”之间,如果是,我想将列名从B复制到A.
输出应如下所示:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})
我的函数使用iterrows如下:
def func1(row):
x = row['value']
for index,value in B.iterrows():
if ((value['low'] <= x) &(x<=value['high'])):
return value['name']
但它还没有达到我想做的目的,
谢谢,
答案 0 :(得分:4)
您可以使用列表推导来遍历A
中的值,然后使用loc
来获取相关的映射值。 le
小于或等于,ge
大于或等于。
例如,第一行中为v = 3
。使用简单的布尔索引:
>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
high low name
0 5 1 one
假设DataFrame B
没有任何重叠范围,那么您将返回上面的一行。然后,使用loc
获取name
列,如下所示。因为每个返回的名称都是一个系列,所以您需要获取第一个和唯一的标量值(例如,使用iat
)。
A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0]
for v in A['value']]
>>> A
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
答案 1 :(得分:1)
我相信你正在寻找这样的东西:
{
"error": {
"message": "There was a problem uploading your video file. Please try again.",
"type": "OAuthException",
"code": 390,
"error_subcode": 1363030,
"is_transient": true,
"error_user_title": "Video Upload Time Out",
"error_user_msg": "Your video upload timed out before it could be completed. This is probably because of a slow network connection or because the video you're trying to upload is too large. Please try again.",
"fbtrace_id": "BjhWMaUVuR1"
}
}
我使用In [1]: import pandas as pd
In [2]: A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
In [3]:
In [3]: B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
In [4]: A
Out[4]:
value
0 3
1 7
2 5
3 18
4 23
5 27
6 21
7 29
In [5]: B
Out[5]:
high low name
0 5 1 one
1 10 6 two
2 15 11 three
3 20 16 four
4 25 21 five
5 30 26 six
In [6]: def func1(x):
...: for row in B.itertuples():
...: if row.low <= x <= row.high:
...: return row.name
...:
In [7]: A.value.map(func1)
Out[7]:
0 one
1 two
2 one
3 four
4 five
5 six
6 five
7 six
Name: value, dtype: object
In [8]: A['name'] = A['value'].map(func1)
In [9]: A
Out[9]:
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
因为它应该快一点,但一般情况下效率不高。这是一个解决方案,但可能会有更好的解决方案。
itertuples
快速而肮脏的测试表明亚历山大的方法更快。我想知道它是如何扩展的。