从现有数据库创建一个新的Pandas DataFrame

时间:2019-04-11 20:38:16

标签: python pandas datetime grouping

我有一个pandas数据框,其中包含基于月份的数据,如下所示:

  df 

   id Month  val
   g1   Jan    1
   g1   Feb    5
   g1   Mar   61

我想要的是以下内容:

我想将数据框转换为一个星期结构,并在该月可能发生的所有星期中都替换(或不替换)month列,因此输出应类似于:(因此每个月4周)

   new_df 

     id  week  val
     g1     1    1
     g1     2    1
     g1     3    1
     g1     4    1
     g1     5    5
     g1     6    5
     g1     7    5
     g1     8    5
     g1     9   61
     g1    10   61
     g1    11   61
     g1    12   61

我尝试使用以下函数并将其应用于熊猫数据框,但这不起作用:

SAMPLE CODE

      def myfun(mon):
        if mon == 'Jan':
           wk = list(range(1,5))
        elif mon == 'Feb':
           wk = list(range(5,9))
        else:
           wk = list(range(9,13))
        return wk

   df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
   del df['Month']

我得到的输出如下,这不是我想要的:

       id    val         week
       g1    1     [1, 2, 3, 4]
       g1    5     [5, 6, 7, 8]
       g1    61  [9, 10, 11, 12]

还有实现这一目标的巧妙方法吗?

我们将非常感谢您的帮助。谢谢。

2 个答案:

答案 0 :(得分:1)

我们可以将DataFrame.groupbyDataframe.reindexrange(4)一起使用。在输出中,我们使用fillna和方法forwardfill ffill来替换NaN

之后,我们将Month转换为pandas.to_datetime的日期时间格式,以便我们可以按月排序。

最后,我们创建列Week bij以获取索引并加1并删除Month列:

# extend index with 4 weeks for each month
df_new = pd.concat([
    d.reset_index(drop=True).reindex(range(4))
    for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')

# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month

# Now we can sort it by month
df_new.sort_values('Month', inplace=True)

# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1

# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)

哪种产量:

print(df_new)
    id   val  Week
0   g1   1.0     1
1   g1   1.0     2
2   g1   1.0     3
3   g1   1.0     4
4   g1   5.0     5
5   g1   5.0     6
6   g1   5.0     7
7   g1   5.0     8
8   g1  61.0     9
9   g1  61.0    10
10  g1  61.0    11
11  g1  61.0    12

答案 1 :(得分:1)

尝试一下:

month={'Jan':1,'Feb':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'Sept':9,'Oct':10,'Nov':11,'Dec':12}
new_df = pd.DataFrame(columns=['id', 'week', 'val']) # create a new dataframe
for index,row in df.iterrows(): # for each row in df
    month_num=(month[row[1]]-1)*4+1 # to get the starting week order from the dictionary "month"
    for i in range(4): # iterate four times 
        # append (add) the row with the week value to the new data frame
        new_df = new_df.append({'id':row[0],'week':month_num,'val':row[2]}, ignore_index=True)
        month_num+=1 # increment the week order
print(new_df)