Python相当于R的代码

时间:2016-12-24 02:57:22

标签: python pandas

我昨天在同一行发布了一个问题。这是它的略微修改版本。 previous question here

我有2个数据帧如下:

data1看起来像这样:

id          address       
1          11123451
2          78947591

data2如下所示:

lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y

我想在data1中创建另一个名为“place”的列,其中包含id所在的位置。会有很多ID来自同一个地方。有些ID没有匹配。

这里的地址是浮点值。

我在Python中实际寻找的内容与R相同。在R中编写以下内容会更容易。但我不确定如何在Python中对此进行编码。有人可以帮我这个吗?

data_place = rep(NA, nrow(data1))
for (i in (1:nrow(data1)){
tmp = as.character(data2[data1$address[i] >= data2$lowerbound_address & data1$address[i] <= data2$upperbound_address, "place"])
if(length(tmp)==1) {data_place[i] = tmp}
}

data$place = data_place

2 个答案:

答案 0 :(得分:2)

这样的事情会起作用。

import pandas as pd
import numpy as np

# The below section is only used to import data

from io import StringIO

data = """      
id          address       
1          11123451
2          78947591
3          50000000
"""

data2 = """
lowerbound_address   upperbound_address    place
78392888                 89000000            X
10000000                 20000000            Y
"""

# The above section is only used to import data

df = pd.read_csv(StringIO(data), delimiter='\s+')
df2 = pd.read_csv(StringIO(data2), delimiter='\s+')

df['new']=np.nan

df['new'][(df['address'] > df2['lowerbound_address'][0]) & (df['address'] < df2['upperbound_address'][0])] = 'X'
df['new'][(df['address'] > df2['lowerbound_address'][1]) & (df['address'] < df2['upperbound_address'][1])] = 'Y'

除了pandas之外,我们还使用numpy作为np.nan

我所做的就是创建一个新列并为其指定NaN。然后创建了两个条件来分配X或&#39; Y&#39;基于第二个数据的上下边界(最后两行)。

最终结果:

   id   address  new
0   1  11123451    Y
1   2  78947591    X
2   3  50000000  NaN

答案 1 :(得分:0)

执行merge_asof然后替换地址超出nan的所有时间。

data1.sort_values('address', inplace = True)
data2.sort_values('lowerbound_address', inplace=True)
data3 = pd.merge_asof(data1, data2, left_on='address', right_on='lowerbound_address')
data3['place'] = data3['place'].where(data3.address <= data3.upperbound_address)
data3.drop(['lowerbound_address', 'upperbound_address'], axis=1)

输出

   id   address place
0   1  11123451     Y
1   3  50000000   NaN
2   2  78947591     X