Pandas DataFrame插入计算行

时间:2014-04-07 14:23:57

标签: pandas

示例中显示了问题的简化:

本质上,我想基于使用跨越新行的2行中的值的计算在现有行之间插入新行。

在我的示例中,您可以看到我们插入一行,该行是前后行的中点值。

我的目标是实际使用一个函数来计算2个lat lons之间的中点并插入该值。我认为这个简化的例子将展示所需的技术。如果我得到答案,我将包括lat,lon示例的完整工作代码。

import pandas as pd
import numpy as np

def midpoint(x,y):
    return (x+y)/2

#we start with this
pd.DataFrame(np.arange(2,10).reshape((4,2)),columns=['A','B'])

   A  B
0  2  3
1  4  5
2  6  7
3  8  9

#want to get to this.
pd.DataFrame(np.array([2,3,3,4,4,5,5,6,6,7,7,8,8,9]).reshape((7,2)),columns=['A','B'])

   A  B
0  2  3
1  3  4
2  4  5
3  5  6
4  6  7
5  7  8
6  8  9

Ok here is the example with the LatLons

gp = pd.DataFrame(np.array([[25.7,-87.7],[26.3,-88.6],[27.2,-89.2],[28.2,-89.6]]),columns=['Latitude','Longitude'] )

   Latitude  Longitude
0      25.7      -87.7
1      26.3      -88.6
2      27.2      -89.2
3      28.2      -89.6

x = gp[['Latitude','Longitude']]
y = gp[['Latitude','Longitude']].shift(periods=-1)
foo = pd.merge(x, y , suffixes=['1','2'],left_index="True",right_index="True")
#trim the last row as it has NaNs
bar= foo[['Latitude1','Longitude1','Latitude2','Longitude2']][:-1]
#calculate midpoint and stitch back to main data
bar = bar.apply(midpoint, axis=1)
fogazzi = np.vstack((gp[['Latitude','Longitude']].values,bar[['MidPointLatitude','MidPointLongitude']].values))
gp = pd.DataFrame(fogazzi,columns =['Latitude','Longitude']).sort(columns =['Latitude','Longitude'])

    Latitude  Longitude
0  25.700000 -87.700000
4  26.000696 -88.148851
1  26.300000 -88.600000
5  26.750316 -88.898812
2  27.200000 -89.200000
6  27.700144 -89.399084
3  28.200000 -89.600000

-------------------------------------

def midpoint(cords):
   lat1, lon1,lat2,lon2 = cords
   assert -90 <= lat1 <= 90
   assert -90 <= lat2 <= 90
   assert -180 <= lon1 <= 180
   assert -180 <= lon2 <= 180
   lat1, lon1, lat2, lon2 = map(math.radians, (lat1, lon1, lat2, lon2))
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return pd.Series({'MidPointLatitude': math.degrees(lat3), 'MidPointLongitude': math.degrees(lon3)})

2 个答案:

答案 0 :(得分:0)

你可以使用这样的合并:

In [54]:

df = pd.DataFrame(np.arange(2,10).reshape((4,2)),columns=['A','B'])
df
Out[54]:
   A  B
0  2  3
1  4  5
2  6  7
3  8  9

[4 rows x 2 columns]
In [53]:

(df + df.shift(periods=-1))/2
Out[53]:
    A   B
0   3   4
1   5   6
2   7   8
3 NaN NaN

[4 rows x 2 columns]
In [59]:

combined = df.merge((df + df.shift(periods=-1))/2, how='outer')
combined.sort(columns=['A'],inplace=True)
In [60]:

combined
Out[60]:
    A   B
0   2   3
4   3   4
1   4   5
5   5   6
2   6   7
6   7   8
3   8   9
7 NaN NaN

[8 rows x 2 columns]

答案 1 :(得分:0)

说我们的指数设置略有不同:

df = pd.DataFrame(np.arange(2,10).reshape((4,2)), index=range(0, 8, 2), columns=['A','B'])

然后:

res = pd.DataFrame(index=range(len(df) * 2 - 1)).join(df)
res.interpolate()