Question

我有一个DataFrame month_data，看起来像这样：

    DATE_dh TAVG    temp_Celsius
0   195201  29.478261   -1.400966
1   195202  24.800000   -4.000000
2   195203  13.807692   -10.106838
3   195204  39.607143   4.226190
4   195205  44.666667   7.037037
5   195206  56.500000   13.611111
6   195207  61.214286   16.230159
7   195208  57.483871   14.157706
8   195209  47.230769   8.461538
...
334 197911  34.500000   1.388889
335 197912  25.129032   -3.817204

我试图计算这些年来每个月的平均温度，所以最终我将获得12行数据（一月，二月等平均温度）。计算部分对我来说很清楚，但我不知道如何从此数据框中选择直到198001的195201、195301、195401。

我使用DataFrameGroupBy制作了DATE_dh，所以现在我有了每月数据，而不是原始的每日数据。

    # Specify the time of the first month (as text)
time1 = '195201'

# Select the group
group1 = grouped.get_group(time1)

# Create an empty DataFrame for the aggregated values
monthly_data = pd.DataFrame()

# The columns that we want to aggregate
mean_cols = ['TAVG']

# Iterate over the groups
for key, group in grouped:
   # Aggregate the data
   mean_values = group[mean_cols].mean()

   # Add the ´key´ (i.e. the date information) into the aggregated values
   mean_values['DATE_dh'] = key

   # Append the aggregated values into the DataFrame
   monthly_data = monthly_data.append(mean_values, ignore_index=True)

我可能应该继续这种方式，但是关键是因为我要选择的数据不再是多个195201，而是195201、195301 ...

Answer 1

您可以将其用作组密钥

df['groupkey']=df.DATE_dh.astype(str).str[-2:]
#df.DATE_dh.astype(str).str[-2:]
Out[216]: 
0    01
1    02
2    03
3    04
4    05
5    06
6    07
7    08
8    09
Name: DATE_dh, dtype: object

Answer 2

由于所有时间数据都采用相同的格式，因此您可以使用月份创建一个新列，然后在此列上执行group_by。假设数据框的名称为df，温度为temp的列我会的：

df.month = df.Date_dh.apply(lambda x: x[-2:])
#Adds a new column to your dataframe by taking the last 2 characters of the date(the month)
mean_monthly = df[['temp','month']].group_by('month').mean()
#Groups by month value and calculate mean.

我认为这应该可以解决问题，但是请随时询问您是否需要澄清。

熊猫：如何选择有条件的数据行（DataFrameGroupBy）

2 个答案: