计算子集(切片)后如何更新原始数据帧?

时间:2018-08-13 02:30:17

标签: python python-3.x pandas dataframe

考虑以下示例:

<form>
  <div class='wrap'>
    <input id="radio-1" class="radio-custom" name="radio-group" type="radio" checked>
    <label for="radio-1" class="radio-custom-label"><p>Row one</p><p>Row two</p> </label>
  </div>
</form>

输出:

df = pd.DataFrame(
            {'a': ['one', 'one', 'one', 'one', 'two', 'two', 'two', 'three', 'four'],
            'b': ['x', 'y','x', 'y', 'x', 'y', 'x', 'x', 'x'],
            'c': np.random.randn(9)}
         )

df['sum_c_3'] = 99.99

现在我必须做很多操作,所以仅举一个例子,我将计算3条下一条记录的总和,将结果保存在新列中,就像这样:

>>> df
       a  b         c  sum_c_3
0    one  x  1.296379    99.99
1    one  y  0.201266    99.99
2    one  x  0.953963    99.99
3    one  y  0.322922    99.99
4    two  x  0.887728    99.99
5    two  y -0.154389    99.99
6    two  x -2.390790    99.99
7  three  x -1.218706    99.99
8   four  x -0.043964    99.99

输出:

for w in ['one','two','three','four']:
    x = df.loc[df['a']==w]
    size = x.iloc[:]['a'].count()
    print("Records %s: %s" %(w,size))
    target_column = x.columns.get_loc('c')
    for i in range(0,size):
        idx = x.index
        acum = x.iloc[i:i+3,target_column].sum()
        x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
    print (x) 

最后我的疑问:如何更新原始数据框?

我可以对切片进行自动切片吗?或者我应该使用索引来更新系列(切片)?

原件保持不变,没有任何更新,请参见此处:

Records one: 4
     a  b         c   sum_c_3
0  one  x  1.296379  2.451607
1  one  y  0.201266  1.478151
2  one  x  0.953963  1.276885
3  one  y  0.322922  0.322922
Records two: 3
     a  b         c   sum_c_3
4  two  x  0.887728 -1.657452
5  two  y -0.154389 -2.545180
6  two  x -2.390790 -2.390790
Records three: 1
       a  b         c   sum_c_3
7  three  x -1.218706 -1.218706
Records four: 1
      a  b         c   sum_c_3
8  four  x -0.043964 -0.043964

1 个答案:

答案 0 :(得分:1)

update的末尾添加for loop

for w in ['one','two','three','four']:
    x = df.loc[df['a']==w]
    size = x.iloc[:]['a'].count()
    print("Records %s: %s" %(w,size))
    target_column = x.columns.get_loc('c')
    for i in range(0,size):
        idx = x.index
        acum = x.iloc[i:i+3,target_column].sum()
        x.loc[x.loc[idx,'sum_c_3'].index[i],'sum_c_3'] = acum
    print (x)
    df.update(x)# here is the one need to add

df
Out[979]: 
       a  b         c   sum_c_3
0    one  x  0.127171  0.210872
1    one  y -0.576157  1.212010
2    one  x  0.659859  1.788168
3    one  y  1.128309  1.128309
4    two  x  0.333521 -0.846657
5    two  y  0.753613 -1.180178
6    two  x -1.933791 -1.933791
7  three  x  0.549009  0.549009
8   four  x  0.895742  0.895742