Question

让这样的Pandas DataFrame df具有可能重复的值的排序数字索引（代表f.e.time或distance）：

     a    b
  0  4.0  1.0
1.5  5.5  2.5
1.5  5.5  2.5
  2  6.0  3.0
4.5  8.5  5.5

我想创建一个列c，其列值为a，其索引移位与原始索引匹配。当使用f.e.填写未获得赋值的原始索引值时，仍应考虑与原始索引不匹配的所有索引移位。线性插值。

示例：

以0.5作为示例索引移位，列c将从列a构造，索引值为0,0.5,1.5,2,2.5,4.5和5，给出以下中间结果以下标记为(i)的缺失值：

      c
  0  Nan(i)
0.5  4.0
1.5  4.75(i)
  2  5.5
2.5  6.0
4.5  7.25(i)
  5  8.5

最终结果应使用df中使用的原始索引编制索引：

     a    b    c
  0  4.0  1.0  Nan(i)
1.5  5.5  2.5  4.75(i)
1.5  5.5  2.5  4.75(i)
  2  6.0  3.0  5.5
4.5  8.5  5.5  7.25(i)

如何获取重复索引的值存在一个问题，在此示例中选择了一个值，但平均值可能是更好的appraoch。

Answer 1

我认为，这就是你试图实现的目标：

#define the shift value
index_shift = 0.5
#find values common to both indices before and after the shift
ind_intersect = df.index.intersection(df.index + index_shift)
#create new column
df["c"] = np.nan
#transfer values from column a to column c
df["c"][ind_intersect] = df["a"][ind_intersect - index_shift]

您当然可以使用除numpy NaN以外的其他值填充新列。

Answer 2

This is my current approach在构造新列时会考虑其中一个重复索引值。

IIF

在具有带有重复项的已排序数字索引的DataFrame中，创建现有列的移位版本和插值缺失值

2 个答案: