Question

我已经问过我的问题但是它的描述不够准确。这个论坛中聪明的人已经提出了解决方案，但我忘了（对不起）要确切的说，如果相关栏中有零，则应该保留。

您好我有一个如下所示的数据框

              2014  2015  2016  2017  2018  2019  

         2014   10    20    30    40    0      5
         2015   0     0    200    0    100     0       
         2016   0     0    200   140    35    10       
         2017   0     0     0     20     0    12

我需要得到这样的结果：

    yearStart  yearStart+1  yearStart+2  yearStart+3  yearStart+4  
0      10          20            30          40          0
1      0          200             0          100         0       
2     200         140            35          10          0
3      20          0             12           0          0

我们的想法是在每一行中选择两个日期之间的列：

索引和索引+ delta，使用delta参数（在此示例中为4）将它们放入数据帧中。

使用iterrows（），需要花费太多时间。

我试过

 df1 = df.apply(lambda x: pd.Series(x[x.keys()>=x.index],1)).fillna(0).astype(int)

但它不起作用：

TypeError: ('Index(...) must be called with a collection of some kind,
1 was passed', 'occurred at index 2014')

谢谢

Answer 1

其中一种方式是

In [1010]: def yearmove(x):
      ...:     idx = x.index.astype(int)
      ...:     idx = idx - x.name
      ...:     mask = idx >= 0
      ...:     idx = 'yearStart' + idx.astype(str)
      ...:     return pd.Series(x.values[mask], index=idx[mask])
      ...:

In [1011]: df.apply(yearmove, 1).fillna(0).astype(int)
Out[1011]:
      yearStart0  yearStart1  yearStart2  yearStart3  yearStart4  yearStart5
2014          10          20          30          40           0           5
2015           0         200           0         100           0           0
2016         200         140          35          10           0           0
2017          20           0          12           0           0           0

如何（使用apply）根据索引或其他列

1 个答案: