我有一个pandas数据框,其中包含基于月份的数据,如下所示:
df
id Month val
g1 Jan 1
g1 Feb 5
g1 Mar 61
我想要的是以下内容:
我想将数据框转换为一个星期结构,并在该月可能发生的所有星期中都替换(或不替换)month列,因此输出应类似于:(因此每个月4周)>
new_df
id week val
g1 1 1
g1 2 1
g1 3 1
g1 4 1
g1 5 5
g1 6 5
g1 7 5
g1 8 5
g1 9 61
g1 10 61
g1 11 61
g1 12 61
我尝试使用以下函数并将其应用于熊猫数据框,但这不起作用:
SAMPLE CODE
def myfun(mon):
if mon == 'Jan':
wk = list(range(1,5))
elif mon == 'Feb':
wk = list(range(5,9))
else:
wk = list(range(9,13))
return wk
df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
del df['Month']
我得到的输出如下,这不是我想要的:
id val week
g1 1 [1, 2, 3, 4]
g1 5 [5, 6, 7, 8]
g1 61 [9, 10, 11, 12]
还有实现这一目标的巧妙方法吗?
我们将非常感谢您的帮助。谢谢。
答案 0 :(得分:1)
我们可以将DataFrame.groupby
和Dataframe.reindex
与range(4)
一起使用。在输出中,我们使用fillna
和方法forwardfill ffill
来替换NaN
。
之后,我们将Month
转换为pandas.to_datetime
的日期时间格式,以便我们可以按月排序。
最后,我们创建列Week
bij以获取索引并加1并删除Month
列:
# extend index with 4 weeks for each month
df_new = pd.concat([
d.reset_index(drop=True).reindex(range(4))
for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')
# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month
# Now we can sort it by month
df_new.sort_values('Month', inplace=True)
# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1
# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)
哪种产量:
print(df_new)
id val Week
0 g1 1.0 1
1 g1 1.0 2
2 g1 1.0 3
3 g1 1.0 4
4 g1 5.0 5
5 g1 5.0 6
6 g1 5.0 7
7 g1 5.0 8
8 g1 61.0 9
9 g1 61.0 10
10 g1 61.0 11
11 g1 61.0 12
答案 1 :(得分:1)
尝试一下:
month={'Jan':1,'Feb':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'Sept':9,'Oct':10,'Nov':11,'Dec':12}
new_df = pd.DataFrame(columns=['id', 'week', 'val']) # create a new dataframe
for index,row in df.iterrows(): # for each row in df
month_num=(month[row[1]]-1)*4+1 # to get the starting week order from the dictionary "month"
for i in range(4): # iterate four times
# append (add) the row with the week value to the new data frame
new_df = new_df.append({'id':row[0],'week':month_num,'val':row[2]}, ignore_index=True)
month_num+=1 # increment the week order
print(new_df)