问题1:我为每个ID设置了不同的ID,我希望将Item vs. Value
曲线切割为最小Value
。基本上,我想过滤掉这些值,直到它达到最小值为止。
问题2.我可以通过在Python中拟合斩波曲线进行外推吗?
请帮助我加快解决方案,因为我有大型数据集,numpy
解决方案会很好。
ID Item Value
30702556 40 1
30702556 41 1
30702556 42 1
30702556 43 1
30702556 44 1.000408
30702556 45 1.006702067
30702556 46 1
30702556 47 1
30702556 48 1
30702556 49 1.000157628
30702556 50 1.001172713
30702556 51 1.009517935
30702556 52 1
30702556 53 1.000502562
30702556 54 1.001030023
30702556 55 1
30702556 56 1.000444755
30702556 57 1.000199956
30702556 58 1
30702556 59 1
30702556 60 1.00032533
30702556 61 0.996561721
30702556 62 0.994058276
30702556 63 0.994029863
30702556 64 0.995741839
30702556 65 0.996079035
30702556 66 0.992283214
30702556 67 0.992360022
30702556 68 0.991403573
30702556 69 0.989097475
30702556 70 0.989217641
30702556 71 0.988622481
30702556 72 0.987000163
30702556 73 0.984607074
30702556 74 0.983260544
30702556 75 0.983233331
30702556 76 0.976835524
30702556 77 0.976070994
30702556 78 0.975937075
30702556 79 0.968117537
30702556 80 0.967753864
30702556 81 0.963275228
30702556 82 0.960392687
30702556 83 0.953357783
30702556 84 0.941583499
30702556 85 0.937935151
30702556 86 0.92811891
30702556 87 0.924914786
30702556 88 0.912813207
30702556 89 0.892052451
30702556 90 0.875778411
30702556 91 0.876931504
30702556 92 0.847877617
30702556 93 0.834768706
30702556 94 0.841510584
30702556 95 0.798555032
30702556 96 0.781663978
30702556 97 0.731056793
30702556 98 0.71332851
30702556 99 0.808900212
30702556 100 0.822300396
30702556 101 0.920676291
30702556 102 0.911704187
30702556 103 1
30702556 104 1
30702556 105 1
30702556 106 1
30702556 107 1
30702556 108 1
30702556 109 1
30702556 110 1
30702556 111 1
30702556 112 1
30702556 113 1
30702556 114 1
30702556 115 1
30702556 116 1
30702556 117 1
30702556 118 1
30702556 119 1
30703716 40 1
30703716 41 1
30703716 42 1
30703716 43 1
30703716 44 1.000408
30703716 45 1.006702067
30703716 46 1
30703716 47 1
30703716 48 1
30703716 49 1.000157628
30703716 50 1.001172713
30703716 51 1.009517935
30703716 52 1
30703716 53 1.000502562
30703716 54 1.001030023
30703716 55 1
30703716 56 1.000444755
30703716 57 1.000199956
30703716 58 1
30703716 59 1
30703716 60 1.00032533
30703716 61 0.996561721
30703716 62 0.994058276
30703716 63 0.994029863
30703716 64 0.995741839
30703716 65 0.996079035
30703716 66 0.992283214
30703716 67 0.992360022
30703716 68 0.991403573
30703716 69 0.989097475
30703716 70 0.989217641
30703716 71 0.988622481
30703716 72 0.987000163
30703716 73 0.984607074
30703716 74 0.983260544
30703716 75 0.983233331
30703716 76 0.976835524
30703716 77 0.976070994
30703716 78 0.975937075
30703716 79 0.968117537
30703716 80 0.967753864
30703716 81 0.963275228
30703716 82 0.960392687
30703716 83 0.953357783
30703716 84 0.941583499
30703716 85 0.937935151
30703716 86 0.92811891
30703716 87 0.924914786
30703716 88 0.912813207
30703716 89 0.892052451
30703716 90 0.875778411
30703716 91 0.876931504
30703716 92 0.847877617
30703716 93 0.834768706
30703716 94 0.841510584
30703716 95 0.798555032
30703716 96 0.781663978
30703716 97 0.731056793
30703716 98 0.71332851
30703716 99 0.808900212
30703716 100 0.822300396
30703716 101 0.920676291
30703716 102 0.911704187
30703716 103 1
30703716 104 1
30703716 105 1
30703716 106 1
30703716 107 1
30703716 108 1
30703716 109 1
30703716 110 1
30703716 111 1
30703716 112 1
30703716 113 1
30703716 114 1
30703716 115 1
30703716 116 1
30703716 117 1
30703716 118 1
30703716 119 1
答案 0 :(得分:2)
使用.loc[:df.Value.idxmin()]
df.groupby('ID', group_keys=False).apply(lambda df: df.loc[:df.Value.idxmin()])
ID Item Value
0 30702556 40 1.000000
1 30702556 41 1.000000
2 30702556 42 1.000000
3 30702556 43 1.000000
4 30702556 44 1.000408
5 30702556 45 1.006702
6 30702556 46 1.000000
7 30702556 47 1.000000
8 30702556 48 1.000000
9 30702556 49 1.000158
10 30702556 50 1.001173
11 30702556 51 1.009518
12 30702556 52 1.000000
13 30702556 53 1.000503
14 30702556 54 1.001030
15 30702556 55 1.000000
16 30702556 56 1.000445
17 30702556 57 1.000200
18 30702556 58 1.000000
19 30702556 59 1.000000
20 30702556 60 1.000325
21 30702556 61 0.996562
22 30702556 62 0.994058
23 30702556 63 0.994030
24 30702556 64 0.995742
25 30702556 65 0.996079
26 30702556 66 0.992283
27 30702556 67 0.992360
28 30702556 68 0.991404
29 30702556 69 0.989097
.. ... ... ...
109 30703716 69 0.989097
110 30703716 70 0.989218
111 30703716 71 0.988622
112 30703716 72 0.987000
113 30703716 73 0.984607
114 30703716 74 0.983261
115 30703716 75 0.983233
116 30703716 76 0.976836
117 30703716 77 0.976071
118 30703716 78 0.975937
119 30703716 79 0.968118
120 30703716 80 0.967754
121 30703716 81 0.963275
122 30703716 82 0.960393
123 30703716 83 0.953358
124 30703716 84 0.941583
125 30703716 85 0.937935
126 30703716 86 0.928119
127 30703716 87 0.924915
128 30703716 88 0.912813
129 30703716 89 0.892052
130 30703716 90 0.875778
131 30703716 91 0.876932
132 30703716 92 0.847878
133 30703716 93 0.834769
134 30703716 94 0.841511
135 30703716 95 0.798555
136 30703716 96 0.781664
137 30703716 97 0.731057
138 30703716 98 0.713329
答案 1 :(得分:1)
IIUC:
df.loc[df.groupby('ID')['Value'].idxmin()]