我有一个像这样的数据集:
year artist track time date.entered wk1 wk2
2000 Pac Baby 4:22 2000-02-26 87 82
2000 Geher The 3:15 2000-09-02 91 87
2000 three_DoorsDown Kryptonite 3:53 2000-04-08 81 70
2000 ATeens Dancing_Queen 3:44 2000-07-08 97 97
2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 84 62
2000 Aaliyah Try_Again 4:03 2000-03-18 59 53
2000 Yolanda Open_My_Heart 5:30 2000-08-26 76 76
我想要的输出是这样的:
year artist track time date week rank
0 2000 Pac Baby 4:22 2000-02-26 1 87
1 2000 Pac Baby 4:22 2000-03-04 2 82
6 2000 ATeens Dancing_Queen 3:44 2000-07-08 1 97
7 2000 ATeens Dancing_Queen 3:44 2000-07-15 2 97
8 2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 1 84
基本上,我正在整理给定的广告牌数据。 没有熊猫链,我可以很容易地做到这一点:
df = pd.read_clipboard()
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'}))
df1['date'] = pd.to_datetime(df1['date']) + pd.to_timedelta((df1['week'] - 1) * 7, 'd')
df1 = df1.sort_values(by=['track', 'date'])
print(df1.head())
问题
有没有办法链接df1['date'] = pd.to_datetime(...)
部分?这样整个操作才能融入单链?
答案 0 :(得分:1)
使用assign
:
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'})
.assign(date = lambda x: pd.to_datetime(x['date']) +
pd.to_timedelta((x['week'] - 1) * 7, 'd'))
.sort_values(by=['track', 'date'])
)