使用pandas进行回归。
示例数据:
from datetime import datetime
import pandas as pd
data = {'date': ['2014-05-01', '2014-05-02', '2014-05-03', '2014-05-04', '2014-05-05', '2014-05-06', '2014-05-07', '2014-05-08', '2014-05-09', '2014-05-10','2014-05-11', '2014-05-12', '2014-05-13', '2014-05-14', '2014-05-15', '2014-05-16', '2014-05-17', '2014-05-18', '2014-05-19', '2014-05-20'],
'height_in_cm': [134, 135, 135, 137, 138, 140, 140, 141, 142, 143, 143, 144, 145, 146, 147, 148, 149, 150, 150, 151], 'participant_id': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}
df = pd.DataFrame(data, columns = ['date', 'height_in_cm', 'participant_id'])
因此,我们有多个参与者,全年每天测量身高。研究的子部分是在一年中的不同月份发现身高增长。因此,我们需要在每个月的第一天采取高度,并将其与开始时的数据相结合,持续3周。因此,上面显示的数据输出应如下所示。如何在熊猫中完成这种合并...任何线索?
data_required = {'ini_date': ['2014-05-01','2014-05-01'],
'height_in_cm': [134, 134], 'participant_id': [1,1], 'future_date': ['2014-05-08','2014-05-15'],'future_height': [141, 147], 'week': [2, 3]}
new_df = pd.DataFrame(data_required, columns = ['ini_date', 'height_in_cm', 'participant_id','future_date','future_height', 'week'])
答案 0 :(得分:1)
初始起点是将日期转换为DateTime并更改频率。每周一次。
df = df.set_index(pd.to_datetime(df.date))
df = df.asfreq('W-THU') #This corresponds to your first day of the week
df['Week'] = df.index.week - df.index[0].week
df = pd.DataFrame(df.iloc[0]).transpose().reset_index().merge(df.iloc[1:], on='participant_id', suffixes=('', '_future'))
del df['index']; del df['Week'] #Removing redundant columns
<强>更新强>
关于你的问题的一点是你使用月初的第一天作为起点,然后将其转移到7D / 14D期间,所有时间戳都不是在期间内完成的。
import pandas as pd
df.date = pd.to_datetime(df.date)
df['y-m'] = df.date.dt.strftime('%Y-%m') #To make sure you track growth month to month
MonthStart = pd.date_range('2014-01-01', freq='MS', periods=100) #Generation for the first day of the month
mask = df.date.isin(MonthStart.shift(1, '7D')) | df.date.isin(MonthStart.shift(2, '7D'))
df[df.date.isin(MonthStart)].merge(df[mask], on=['participant_id', 'y-m'], suffixes=('', '_future')).drop('y-m', axis=1)
<强>解释强>
<强>输出强>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>date</th>
<th>height_in_cm</th>
<th>participant_id</th>
<th>date_future</th>
<th>height_in_cm_future</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2014-05-01</td>
<td>134</td>
<td>1</td>
<td>2014-05-08</td>
<td>141</td>
</tr>
<tr>
<th>1</th>
<td>2014-05-01</td>
<td>134</td>
<td>1</td>
<td>2014-05-15</td>
<td>147</td>
</tr>
</tbody>
</table>