我有一个如下所示的数据框:
n Date Area Rank
12 2007-03-02 Other 4.276250
24 2007-03-02 Other 4.512632
3 2007-03-02 Other 3.513571
36 2007-03-02 Other 4.514000
48 2007-03-02 Other 4.55000
我想重新采样n
间隔之间的值,以便在拥有这些值后最终插入rank
字段。如果n
是日期时间或类似对象,我可以重新采样。我怎么能用浮点数或整数?
输出应该是这样的(Rank的虚拟数字,只是一个例子)
n Date Area Rank
3 2007-03-02 Other 3.513571
4 2007-03-02 Other 3.513675
5 2007-03-02 Other 3.524819
6 2007-03-02 Other 3.613427
7 2007-03-02 Other 3.685635
....
....
答案 0 :(得分:1)
df = (df.set_index('n')
.reindex(range(df.n.min(), df.n.max()))
.interpolate()
.reset_index())
df[['Date','Area']] = df[['Date','Area']].ffill()
输出:
n Date Area Rank
0 3 2007-03-02 Other 3.513571
1 4 2007-03-02 Other 3.598313
2 5 2007-03-02 Other 3.683055
3 6 2007-03-02 Other 3.767797
4 7 2007-03-02 Other 3.852539
5 8 2007-03-02 Other 3.937282
6 9 2007-03-02 Other 4.022024
7 10 2007-03-02 Other 4.106766
8 11 2007-03-02 Other 4.191508
9 12 2007-03-02 Other 4.276250
10 13 2007-03-02 Other 4.295948
11 14 2007-03-02 Other 4.315647
...
根据列类型,可能有一种使用不同方法进行插值的方法 - 那么对于非ffill()
列,您不需要单独的float
。我玩apply()
了一下,但无法让它发挥作用。