我有一个熊猫数据框,看起来像这样:
hotel_id date length_of_stay clicks
A 2019-01-01 3 7
B 2019-01-06 2 11
C 2019-01-03 1 4
我希望结果是:
hotel_id date clicks
A 2019-01-01 7
A 2019-01-02 7
A 2019-01-03 7
B 2019-01-06 11
B 2019-01-07 11
C 2019-01-03 4
因此,我们看到某人每晚住宿酒店有多少点击...
我想不出一种优雅的方式来做..有人可以帮忙吗?
答案 0 :(得分:3)
m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))
或:
newdf = pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
newdf['date'] = [i for day, n in zip(df.date,df.length_of_stay)
for i in pd.date_range(start=day, periods=n)]
完整示例:
import pandas as pd
import numpy as np
data = '''\
hotel_id date length_of_stay clicks
A 2019-01-01 3 7
B 2019-01-06 2 11
C 2019-01-03 1 4'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], sep='\s+')
m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))
print(m)
hotel_id date length_of_stay clicks
0 A 2019-01-01 3 7
1 A 2019-01-02 3 7
2 A 2019-01-03 3 7
3 B 2019-01-06 2 11
4 B 2019-01-07 2 11
5 C 2019-01-03 1 4
答案 1 :(得分:2)
这是利用“丑陋的” df.iterrows()的另一种解决方案:
newdf = pd.concat(pd.DataFrame({
'hotel_id': row['hotel_id'],
'date': pd.date_range(start=row['date'], periods=row['length_of_stay']),
'length_of_stay': row['length_of_stay'],
'clicks': row['clicks']
}) for ind, row in df.iterrows())
完整示例:
import pandas as pd
data = '''\
hotel_id date length_of_stay clicks
A 2019-01-01 3 7
B 2019-01-06 2 11
C 2019-01-03 1 4'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], sep='\s+')
newdf = pd.concat(pd.DataFrame({
'hotel_id': row['hotel_id'],
'date': pd.date_range(start=row['date'], periods=row['length_of_stay']),
'length_of_stay': row['length_of_stay'],
'clicks': row['clicks']
}) for ind, row in df.iterrows())
返回:
clicks date hotel_id length_of_stay
0 7 2019-01-01 A 3
1 7 2019-01-02 A 3
2 7 2019-01-03 A 3
0 11 2019-01-06 B 2
1 11 2019-01-07 B 2
0 4 2019-01-03 C 1