我想使用列(groupby
No
来保存我的数据,并将列date1
和results
的每个结果保存在不同的列中。
以下是具有相应预期输出的输入示例:
我添加了更多数据。并且有很多数据。
答案 0 :(得分:2)
这是一种实现方法:
from datetime import datetime
df = pd.DataFrame({'No.' : ['s1', 's2', 's2'], 'date_1' : [datetime.now() for x in range(3)],
'results' : [1.2, 9.73, 3.71]})
# Use groupby to get the lists of dates and result
result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : list, 'results' : list})
# if you are running a pandas version <0.24.2 uncomment the following line and comment the one above
#result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : lambda x: list(x), 'results' : lambda x: list(x)})
# Look at the number of columns we will have to create
len_max = np.max([len(x) for x in result['results']])
# Create all the required columns
for i in range(1,len_max):
result['date__{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['date_1']]
result['results_{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['results']]
# Modify the first two columns that still contain the lists of the groupby
result['date_1'] = [x[0] for x in result['date_1']]
result['results'] = [x[0] for x in result['results']]
输出:
date_1 results date__2 results_2
No.
s1 2019-07-29 08:00:45.878494 1.20 0 0.00
s2 2019-07-29 08:00:45.878499 9.73 2019-07-29 08:00:45.878500 3.71
答案 1 :(得分:1)
基于vlemaistre的答案-您可以以更紧凑的方式做到这一点:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame({'No.' : ['s1', 's2', 's2'], 'date' : [datetime.now()+timedelta(days=x) for x in range(3)],
'results' : [1.2, 9.73, 3.71]})
joint_df = df.groupby('No.')[['date', 'results']].agg(lambda x: list(x))
result = pd.DataFrame(index=joint_df.index)
for column in df.columns.difference({'No.'}):
result = result.join(pd.DataFrame.from_records(
list(joint_df[column]), index=joint_df.index).rename(lambda x: column+str(x+1), axis=1), how='outer')
输出为:
date1 date2 results1 results2
No.
s1 2019-07-29 12:58:28.627950 NaT 1.20 NaN
s2 2019-07-30 12:58:28.627957 2019-07-31 12:58:28.627960 9.73 3.71