我有两个熊猫DataFrame。这两个DataFrame都是时间序列,具有一个公共列(我们将其称为batchNumber),其结构如下:
[time = index, batchNumber, valueColumn, other fields]
第一个DataFrame的每个batchNumber包含一行,第二个DataFrame的每个batchNumber包含很多行,每一行都有一个单独的时间步长。
我想将这两个框架合并在一起(故意避免在这里“联接”,因为我不确定这是否是正确的技术),将这两个框架合并为一个结构:
[time1, batchNumber1 = batchNumber2, value1, [(time2[0], value2[0]), (time[1], value[1]), (time[2], value[2]), ..., (time[N], value[N])], other fields1]
因此,是一种“ DataFrame中的DataFrame”。那可能吗?一个人怎么能做到呢?
df1中的数据如下所示:[值,batchNumber,时间戳,otherID]列:
[[3.06130831419325e-05 1552608005236178640 '2019-03-15T00:00:05.236178688Z' 1552608005236178640]
[3.1214206203101214e-05 1552608010657198640 '2019-03-15T00:00:10.657198592Z' 1552608010657198640]
[2.9220824990100192e-05 1552608016078218640 '2019-03-15T00:00:16.078218752Z' 1552608016078218640]
[3.1036932744355974e-05 1552608021499238640 '2019-03-15T00:00:21.499238656Z' 1552608021499238640]
[2.9085449547509336e-05 1552608026920258640 '2019-03-15T00:00:26.92025856Z' 1552608026920258640]
...
df2中的数据如下所示,列[值,batchNumber,时间戳,otherID]:
[[512.5499877929688 1552608005236178640 '2019-03-15T00:00:05.236178688Z' 1552608005236178640]
[512.7000122070312 1552608005236178640 '2019-03-15T00:00:05.236180736Z' 1552608005236178640]
[513.0999755859375 1552608005236178640 '2019-03-15T00:00:05.236182528Z' 1552608005236178640]
[513.0 1552608005236178640 '2019-03-15T00:00:05.236184576Z' 1552608005236178640]
[513.5 1552608005236178640 '2019-03-15T00:00:05.236186624Z' 1552608005236178640]
[512.8499755859375 1552608005236178640 '2019-03-15T00:00:05.236188672Z' 1552608005236178640]
[513.3499755859375 1552608005236178640 '2019-03-15T00:00:05.23619072Z' 1552608005236178640]
[512.9500122070312 1552608005236178640 '2019-03-15T00:00:05.236192768Z' 1552608005236178640]
[513.2000122070312 1552608005236178640 '2019-03-15T00:00:05.23619456Z' 1552608005236178640]
[513.2000122070312 1552608005236178640 '2019-03-15T00:00:05.236196608Z' 1552608005236178640]
[512.8499755859375 1552608005236178640 '2019-03-15T00:00:05.23619...
正如人们所看到的,df1中存在batchNumber值,而df2中存在相同的batchNumber多次,从而在df1和df2的行之间创建了1:n关系。
我想让df2的所有行都与df1的batchID匹配,作为添加到df1的新单元格中的子表/列表/ DataFrame,这是象征性的:
df_new.row = df1.row + df2.subset(batchNumber(df2) == batchNumber(df1))