我有一个数据框。我想为每个人创建一个唯一的ID号,并根据人和日期(每周)创建一列。
import pandas as pd
df = pd.DataFrame({ 'name':['one','one','two','two','two','three','four'],
'date':['2019-05-01','2019-05-08','2019-05-01','2019-05-08','2019-05-15','2019-05-01','2019-05-15'],
"a":range(7)})
df['date'] = pd.to_datetime(df['date'],yearfirst=True)
df = df.sort_values(['name','date'])
print(df)
这是数据:
name date a
6 four 2019-05-15 6
0 one 2019-05-01 0
1 one 2019-05-08 1
5 three 2019-05-01 5
2 two 2019-05-01 2
3 two 2019-05-08 3
4 two 2019-05-15 4
预期结果是
name date a id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
如何获取“ id”和“ week”? 谢谢!
答案 0 :(得分:1)
就像@ cs95一样,将GroupBy.ngroup
与除以7
和numpy.ceil
除以除法天数:
df["Id"] = df.groupby("name").ngroup() + 1
df['week'] = np.ceil(df.date.dt.day / 7).astype(int)
print (df)
name date a Id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
或者:
df["Id"] = df.groupby("name").ngroup() + 1
df['week'] = df.groupby("date").ngroup() + 1
print (df)
name date a Id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
答案 1 :(得分:1)
我使用cumsum
来获取df['id']
,并使用groupby
上的df.date
来获取df['week']
:
df['id'] = df.name.ne(df.name.shift()).cumsum()
df['week'] = df.date.groupby(df.date).ngroup() + 1
Out[408]:
name date a id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3