如何以优化的方式在原始数据框上设置值?

时间:2016-02-02 07:39:46

标签: python dataframe grouping

我是Python数据科学的初学者。我在这里有一个点击流数据,如果在会话中首先单击该项,则希望将列值设置为1。我在这里犯了一些错误。这是我的数据集 -

   Sid                    Tstamp     Itemid  Category
0    1  2014-04-07T10:54:09.868Z  214536500         0
1    1  2014-04-07T10:54:46.998Z  214536506         0
2    1  2014-04-07T10:57:00.306Z  214577561         0
3    2  2014-04-07T13:56:37.614Z  214662742         0
4    2  2014-04-07T13:57:19.373Z  214662742         0
5    2  2014-04-07T13:58:37.446Z  214825110         0
6    2  2014-04-07T13:59:50.710Z  214757390         0
7    2  2014-04-07T14:00:38.247Z  214757407         0
8    2  2014-04-07T14:02:36.889Z  214551617         0
9    3  2014-04-02T13:17:46.940Z  214716935         0

这是我的代码 -

def firstclicked(d):
    return 1

k['first_item']=k.groupby('Sid').first().Itemid.apply(firstclicked)

1 个答案:

答案 0 :(得分:1)

您可以通过使用reset_index填充df然后使用该索引切割原始数据框并将其分配给新列来实现:

first_vals = df.reset_index().groupby('Sid').first()

In [168]: first_vals
Out[168]:
     index                    Tstamp     Itemid  Category
Sid
1        0  2014-04-07T10:54:09.868Z  214536500         0
2        3  2014-04-07T13:56:37.614Z  214662742         0
3        9  2014-04-02T13:17:46.940Z  214716935         0

In [169]: first_vals['index']
Out[169]:
Sid
1    0
2    3
3    9
Name: index, dtype: int64


df['new'] = 0
df.ix[first_vals['index'],'new'] = 1


In [172]: df
Out[172]:
   Sid                    Tstamp     Itemid  Category  new
0    1  2014-04-07T10:54:09.868Z  214536500         0    1
1    1  2014-04-07T10:54:46.998Z  214536506         0    0
2    1  2014-04-07T10:57:00.306Z  214577561         0    0
3    2  2014-04-07T13:56:37.614Z  214662742         0    1
4    2  2014-04-07T13:57:19.373Z  214662742         0    0
5    2  2014-04-07T13:58:37.446Z  214825110         0    0
6    2  2014-04-07T13:59:50.710Z  214757390         0    0
7    2  2014-04-07T14:00:38.247Z  214757407         0    0
8    2  2014-04-07T14:02:36.889Z  214551617         0    0
9    3  2014-04-02T13:17:46.940Z  214716935         0    1