Question

我有2个pandas数据帧，如下所示。

Data Frame 1:  

Section Chainage    Frame  
R125R002    10.133  1  
R125R002    10.138  2  
R125R002    10.143  3  
R125R002    10.148  4  
R125R002    10.153  5  

Data Frame 2:

Section Chainage    1   2   3   4   5   6   7   8   
R125R002    10.133  0   0   1   0   0   0   0   0     
R125R002    10.134  0   0   1   0   0   0   0   0     
R125R002    10.135  0   0   1   0   0   0   0   0     
R125R002    10.136  0   0   1   0   0   0   0   0     
R125R002    10.137  0   0   1   0   0   0   0   0     
R125R002    10.138  0   0   1   0   0   0   0   0     
R125R002    10.139  0   0   1   0   0   0   0   0     
R125R002    10.14   0   0   1   0   0   0   0   0     
R125R002    10.141  0   0   1   0   0   0   0   0     
R125R002    10.142  0   0   1   0   0   0   0   0     
R125R002    10.143  0   0   1   0   0   0   0   0     
R125R002    10.144  0   0   1   0   0   0   0   0     
R125R002    10.145  0   0   1   0   0   0   0   0     
R125R002    10.146  0   0   1   0   0   0   0   0     
R125R002    10.147  0   0   1   0   0   0   0   0     
R125R002    10.148  0   0   1   0   0   0   0   0     
R125R002    10.149  0   0   1   0   0   0   0   0     
R125R002    10.15   0   0   1   0   0   0   0   0     
R125R002    10.151  0   0   1   0   0   0   0   0     
R125R002    10.152  0   0   1   0   0   0   0   0     
R125R002    10.153  0   0   1   0   0   0   0   0

必需的输出数据帧：

Section Chainage Frame  1   2   3   4   5   6   7   8   
R125R002    10.133  1   0   0   1   0   0   0   0   0     
R125R002    10.138  2   0   0   1   0   0   0   0   0     
R125R002    10.143  3   0   0   1   0   0   0   0   0     
R125R002    10.148  4   0   0   1   0   0   0   0   0     
R125R002    10.153  5   0   0   1   0   0   0   0   0

数据帧2的间隔增加1 m，而数据帧1的增量为5 m。我想将数据帧2合并到数据帧1并应用group by。第1列的Groupby为sum，第2列为max，colum3为8的平均值。

在sql中，我会链接2帧之间的部分，并在链接的条件之间应用，然后添加groupby。
有没有办法在熊猫中实现这一目标。

Answer 1

您可以先使用字典中的define函数按每5行聚合：

d = {'Section':'first','Chainage':'first','1':'sum','2':'max', '8':'mean'}
df22 = df2.groupby([np.arange(len(df2.index)) // 5], as_index=False).agg(d)
print (df22)
    Section  Chainage  1  2  8
0  R125R002    10.133  0  0  0
1  R125R002    10.138  0  0  0
2  R125R002    10.143  0  0  0
3  R125R002    10.148  0  0  0
4  R125R002    10.153  0  0  0

<强>详细：

print (np.arange(len(df2.index)) // 5)
[0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4]

然后需要merge：

df = df1.merge(df22, on=['Section','Chainage'])
print (df)
    Section  Chainage  Frame  1  2  8
0  R125R002    10.133      1  0  0  0
1  R125R002    10.138      2  0  0  0
2  R125R002    10.143      3  0  0  0
3  R125R002    10.148      4  0  0  0
4  R125R002    10.153      5  0  0  0

熊猫合并和吞噬

1 个答案: