我是Python数据科学的初学者。我在这里有一个点击流数据,如果在会话中首先单击该项,则希望将列值设置为1。我在这里犯了一些错误。这是我的数据集 -
Sid Tstamp Itemid Category 0 1 2014-04-07T10:54:09.868Z 214536500 0 1 1 2014-04-07T10:54:46.998Z 214536506 0 2 1 2014-04-07T10:57:00.306Z 214577561 0 3 2 2014-04-07T13:56:37.614Z 214662742 0 4 2 2014-04-07T13:57:19.373Z 214662742 0 5 2 2014-04-07T13:58:37.446Z 214825110 0 6 2 2014-04-07T13:59:50.710Z 214757390 0 7 2 2014-04-07T14:00:38.247Z 214757407 0 8 2 2014-04-07T14:02:36.889Z 214551617 0 9 3 2014-04-02T13:17:46.940Z 214716935 0
这是我的代码 -
def firstclicked(d):
return 1
k['first_item']=k.groupby('Sid').first().Itemid.apply(firstclicked)
答案 0 :(得分:1)
您可以通过使用reset_index
填充df然后使用该索引切割原始数据框并将其分配给新列来实现:
first_vals = df.reset_index().groupby('Sid').first()
In [168]: first_vals
Out[168]:
index Tstamp Itemid Category
Sid
1 0 2014-04-07T10:54:09.868Z 214536500 0
2 3 2014-04-07T13:56:37.614Z 214662742 0
3 9 2014-04-02T13:17:46.940Z 214716935 0
In [169]: first_vals['index']
Out[169]:
Sid
1 0
2 3
3 9
Name: index, dtype: int64
df['new'] = 0
df.ix[first_vals['index'],'new'] = 1
In [172]: df
Out[172]:
Sid Tstamp Itemid Category new
0 1 2014-04-07T10:54:09.868Z 214536500 0 1
1 1 2014-04-07T10:54:46.998Z 214536506 0 0
2 1 2014-04-07T10:57:00.306Z 214577561 0 0
3 2 2014-04-07T13:56:37.614Z 214662742 0 1
4 2 2014-04-07T13:57:19.373Z 214662742 0 0
5 2 2014-04-07T13:58:37.446Z 214825110 0 0
6 2 2014-04-07T13:59:50.710Z 214757390 0 0
7 2 2014-04-07T14:00:38.247Z 214757407 0 0
8 2 2014-04-07T14:02:36.889Z 214551617 0 0
9 3 2014-04-02T13:17:46.940Z 214716935 0 1