Question

我有原始数据框：

<table class="myTable">
  <tr class="singleHeight">
    <td colspan=6 class="leftTopRight">Number</td>
  </tr>
  <tr class="singleHeight">
    <td colspan=6 class="leftRight">&nbsp;</td>
  </tr>
  <tr class="singleHeight">
    <td colspan=1 class="allAround">z</td>
    <td colspan=5 class="leftRightBottom">1018</td>
  </tr>  
</table>

该值与上一行相同。

输出应为：

ID    T    value
1     0    1
1     4    3
2     0    0
2     4    1
2     7    3

我尝试循环需要很长时间。

有什么办法解决大型数据框的问题吗？

谢谢！

Answer 1

对于解决方案，每个组在T中必须有唯一的整数值。

将groupby与自定义功能一起使用-每个组使用reindex，然后通过向前填充NaN来替换value列中的ffill：

df1 = (df.groupby('ID')['T', 'value']
        .apply(lambda x: x.set_index('T').reindex(np.arange(x['T'].min(), x['T'].max() + 1)))
        .ffill()
        .astype(int)
        .reset_index())
print (df1)
    ID  T  value
0    1  0      1
1    1  1      1
2    1  2      1
3    1  3      1
4    1  4      3
5    2  0      0
6    2  1      0
7    2  2      0
8    2  3      0
9    2  4      1
10   2  5      1
11   2  6      1
12   2  7      3

如果出现错误：

ValueError：无法从重复的轴重新索引

这意味着每个组中有一些重复的值，例如：

print (df)
   ID  T  value
0   1  0      1
1   1  4      3
2   2  0      0
3   2  4      1 <-4 is duplicates per group 2
4   2  4      3 <-4 is duplicates per group 2
5   2  7      3

解决方案首先是唯一T的集合值-例如sum：

df = df.groupby(['ID', 'T'], as_index=False)['value'].sum()
print (df)
   ID  T  value
0   1  0      1
1   1  4      3
2   2  0      0
3   2  4      4
4   2  7      3

如何使用Pandas向Dataframe添加增量编号

1 个答案: