是否有更好的方法根据每行中包含的启动和停止条件对一系列浮动进行上采样?这是我开始的样本:
borehole top bottom lithology
0 AP-2 94.6 95.1 dolomite
1 AP-2 95.1 96.7 limestone
2 AP-2 96.7 97.0 dolomite
3 AP-2 97.0 97.5 limestone
4 AP-2 97.5 97.8 limestone
5 AP-3 87.4 87.7 limestone
6 AP-3 87.7 88.1 limestone
7 AP-3 88.1 88.5 dolomite
8 AP-3 88.5 89.1 limestone
对于每一行,我想使用选定的增量在顶部和底部值之间添加行,并向前填充钻孔和岩性值。数据帧中的增量是常量。这是我想要的填充数据帧,增量为0.1。 (这仅适用于上述数据帧中的前两行:)
borehole Top Lith
0 AP-2 94.6 dolomite
1 AP-2 94.7 dolomite
2 AP-2 94.8 dolomite
3 AP-2 94.9 dolomite
4 AP-2 95.0 dolomite
5 AP-2 95.1 limestone
6 AP-2 95.2 limestone
7 AP-2 95.3 limestone
8 AP-2 95.4 limestone
9 AP-2 95.5 limestone
10 AP-2 95.6 limestone
11 AP-2 95.7 limestone
12 AP-2 95.8 limestone
13 AP-2 95.9 limestone
14 AP-2 96.0 limestone
15 AP-2 96.1 limestone
16 AP-2 96.2 limestone
17 AP-2 96.3 limestone
18 AP-2 96.4 limestone
19 AP-2 96.5 limestone
20 AP-2 96.6 limestone
21 AP-2 96.7 limestone
这里是我使用的代码并且它有效,但是当我在熊猫中制作一个循环时,我想知道我是否遗漏了一些明显的东西。 pd.DataFrame.resample()非常诱人,但我无法弄清楚如何使其与非时间数据一起使用。
import pandas as pd
import numpy as np
liths = pd.DataFrame(
{'borehole': {0: 'AP-2',
1: 'AP-2',
2: 'AP-2',
3: 'AP-2',
4: 'AP-2',
5: 'AP-3',
6: 'AP-3',
7: 'AP-3',
8: 'AP-3'},
'bottom': {0: 95.099999999999994,
1: 96.700000000000003,
2: 97.0,
3: 97.5,
4: 97.799999999999997,
5: 87.700000000000003,
6: 88.099999999999994,
7: 88.5,
8: 89.099999999999994},
'lithology': {0: 'dolomite',
1: 'limestone',
2: 'dolomite',
3: 'limestone',
4: 'limestone',
5: 'limestone',
6: 'limestone',
7: 'dolomite',
8: 'limestone'},
'top': {0: 94.599999999999994,
1: 95.099999999999994,
2: 96.700000000000003,
3: 97.0,
4: 97.5,
5: 87.400000000000006,
6: 87.700000000000003,
7: 88.099999999999994,
8: 88.5}}
)
filled = []
increment = 0.1
for row in liths.itertuples():
start = row.top
end = row.bottom
for i in np.arange(start, end, increment):
filled.append([row.borehole, i, row.lithology])
filled = pd.DataFrame(filled, columns=['borehole', 'Top', 'Lith']); filled
答案 0 :(得分:2)
我从每一行构建一个新的数据框并将它们全部连接起来
def expand(r):
a = np.arange(r.top, r.bottom, .1)
n = len(a)
return pd.DataFrame(dict(
borehole=[r.borehole] * n,
lithology=[r.lithology] * n,
top=a
))
pd.concat([expand(r) for r in df.itertuples()], ignore_index=True)
borehole lithology top
0 AP-2 dolomite 94.6
1 AP-2 dolomite 94.7
2 AP-2 dolomite 94.8
3 AP-2 dolomite 94.9
4 AP-2 dolomite 95.0
5 AP-2 limestone 95.1
6 AP-2 limestone 95.2
7 AP-2 limestone 95.3
8 AP-2 limestone 95.4
9 AP-2 limestone 95.5
10 AP-2 limestone 95.6
11 AP-2 limestone 95.7
12 AP-2 limestone 95.8
13 AP-2 limestone 95.9
14 AP-2 limestone 96.0
15 AP-2 limestone 96.1
16 AP-2 limestone 96.2
17 AP-2 limestone 96.3
18 AP-2 limestone 96.4
19 AP-2 limestone 96.5
20 AP-2 limestone 96.6
21 AP-2 limestone 96.7
22 AP-2 dolomite 96.7
23 AP-2 dolomite 96.8
24 AP-2 dolomite 96.9
25 AP-2 limestone 97.0
26 AP-2 limestone 97.1
27 AP-2 limestone 97.2
28 AP-2 limestone 97.3
29 AP-2 limestone 97.4
30 AP-2 limestone 97.5
31 AP-2 limestone 97.6
32 AP-2 limestone 97.7
33 AP-3 limestone 87.4
34 AP-3 limestone 87.5
35 AP-3 limestone 87.6
36 AP-3 limestone 87.7
37 AP-3 limestone 87.8
38 AP-3 limestone 87.9
39 AP-3 limestone 88.0
40 AP-3 dolomite 88.1
41 AP-3 dolomite 88.2
42 AP-3 dolomite 88.3
43 AP-3 dolomite 88.4
44 AP-3 dolomite 88.5
45 AP-3 limestone 88.5
46 AP-3 limestone 88.6
47 AP-3 limestone 88.7
48 AP-3 limestone 88.8
49 AP-3 limestone 88.9
50 AP-3 limestone 89.0