我有2个pandas数据帧,如下所示。
Data Frame 1:
Section Chainage Frame
R125R002 10.133 1
R125R002 10.138 2
R125R002 10.143 3
R125R002 10.148 4
R125R002 10.153 5
Data Frame 2:
Section Chainage 1 2 3 4 5 6 7 8
R125R002 10.133 0 0 1 0 0 0 0 0
R125R002 10.134 0 0 1 0 0 0 0 0
R125R002 10.135 0 0 1 0 0 0 0 0
R125R002 10.136 0 0 1 0 0 0 0 0
R125R002 10.137 0 0 1 0 0 0 0 0
R125R002 10.138 0 0 1 0 0 0 0 0
R125R002 10.139 0 0 1 0 0 0 0 0
R125R002 10.14 0 0 1 0 0 0 0 0
R125R002 10.141 0 0 1 0 0 0 0 0
R125R002 10.142 0 0 1 0 0 0 0 0
R125R002 10.143 0 0 1 0 0 0 0 0
R125R002 10.144 0 0 1 0 0 0 0 0
R125R002 10.145 0 0 1 0 0 0 0 0
R125R002 10.146 0 0 1 0 0 0 0 0
R125R002 10.147 0 0 1 0 0 0 0 0
R125R002 10.148 0 0 1 0 0 0 0 0
R125R002 10.149 0 0 1 0 0 0 0 0
R125R002 10.15 0 0 1 0 0 0 0 0
R125R002 10.151 0 0 1 0 0 0 0 0
R125R002 10.152 0 0 1 0 0 0 0 0
R125R002 10.153 0 0 1 0 0 0 0 0
必需的输出数据帧:
Section Chainage Frame 1 2 3 4 5 6 7 8
R125R002 10.133 1 0 0 1 0 0 0 0 0
R125R002 10.138 2 0 0 1 0 0 0 0 0
R125R002 10.143 3 0 0 1 0 0 0 0 0
R125R002 10.148 4 0 0 1 0 0 0 0 0
R125R002 10.153 5 0 0 1 0 0 0 0 0
数据帧2的间隔增加1 m,而数据帧1的增量为5 m。我想将数据帧2合并到数据帧1并应用group by。 第1列的Groupby为sum,第2列为max,colum3为8的平均值。
在sql中,我会链接2帧之间的部分,并在链接的条件之间应用,然后添加groupby。
有没有办法在熊猫中实现这一目标。
答案 0 :(得分:1)
您可以先使用字典中的define函数按每5行聚合:
d = {'Section':'first','Chainage':'first','1':'sum','2':'max', '8':'mean'}
df22 = df2.groupby([np.arange(len(df2.index)) // 5], as_index=False).agg(d)
print (df22)
Section Chainage 1 2 8
0 R125R002 10.133 0 0 0
1 R125R002 10.138 0 0 0
2 R125R002 10.143 0 0 0
3 R125R002 10.148 0 0 0
4 R125R002 10.153 0 0 0
<强>详细强>:
print (np.arange(len(df2.index)) // 5)
[0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4]
然后需要merge
:
df = df1.merge(df22, on=['Section','Chainage'])
print (df)
Section Chainage Frame 1 2 8
0 R125R002 10.133 1 0 0 0
1 R125R002 10.138 2 0 0 0
2 R125R002 10.143 3 0 0 0
3 R125R002 10.148 4 0 0 0
4 R125R002 10.153 5 0 0 0