我想计算特定月份的熊猫列的平均值

时间:2021-01-10 12:53:00

标签: python pandas

让 df 与列 first_name = input('PLEASE ENTER YOUR FIRST NAME: ') last_name = input('PLEASE ENTER YOUR SURNAME: ') print(f'Hello {first_name} {last_name}, before you enter.') def age_det(): age = input(f'How old are you?') converted_age = [int(age)] for num in converted_age: while num>=18: print(f'Awesome, {first_name}, welcome to Blablabla') num += 100 while num <= 18: break print(f'Sorry, {first_name}, but we require you to be at least 18 to enter Blablabla.') num += 100 age_det() #I want the code to stop here if the age entered is under 18 username = input('Before we start, please pick a username: ') print(f'Woah! {username}, good choice!') yearmonthall timedaytimeregion 和 {{1 }} 包含若干年和地区的数据:

temp

我想获得一个新的 df,其中包含年份和地区作为列,并计算特定月份(第 6、7 和 8 个月)的温度和降水平均值:

precipitation

我尝试了以下代码:

year  month alltime daytime   region        temp            precipitation
2000    1   True    False   saint louis 21.3105241935484  0.03
2000    1   False   True    saint louis 22.7246627565982  0.025
2000    1   False   False   saint louis 20.0136559139785  0.012
2000    2   True    False   saint louis 22.1646408045977  0.013
2000    2   False   True    saint louis 23.557868338558   0.07
2000    2   False   False   saint louis 20.8678927203065  0.012 
2000    3   True    False   saint louis 22.9311155913978  0.031
2000    3   False   True    saint louis 24.9204398826979  0.016
2000    3   False   False   saint louis 21.011541218638   0.0121
2000    4   True    False   saint louis 22.5921805555556  0.019
2000    4   False   True    saint louis 24.3710303030303  0.054
2000    4   False   False   saint louis 20.8877777777778  0.043 
2000    5   True    False   saint louis 21.4352016129032  0.032
2000    5   False   True    saint louis 22.8382404692082  0.023

但是这返回了所有 12 个月的平均值:

year  region            temp      precipitation
                                       
2000  saint louis     22.123      321.23
2000  diff region     24.643      673.12
2001  saint louis     21.433      134.27

2 个答案:

答案 0 :(得分:1)

那么您所缺少的是您的数据子集,其中仅包含您需要的月份。所以, 您可以使用要包含的月份创建一个新数据框,然后使用 groupby.agg:

months = ['6','7','8']
temp = df[df['month'].isin(months)]
res = (df.groupby(['region','year']).agg({'temp':'mean','precipitation':'mean'})).reset_index()

会给你:

        region  year       temp  precipitation
0  saint louis  2000  22.259055       0.028007
1  saint louis  2001  22.838240       0.023000

仅供参考,我在样本中添加了一些额外数据,因为在您发布的数据中,您只有 1 年和 1 个地区。

答案 1 :(得分:1)

使用 Boolean indexinggroup byMONTHYEAR 在所需时间段内按 REGION 过滤您的数据框,然后汇总您的类别的平均值,例如 { {1}} 和 TEMP

PRECIP

示例输出:

import pandas as pd

#fake data generation
import numpy as np
np.random.seed(1234)
n=30
df = pd.DataFrame({"YEAR": np.random.choice([2000, 2001, 2003], n), 
                    "MONTH": np.random.randint(4, 10, n), 
                    "REGION": np.random.choice(["A", "B", "D"], n),
                    "TEMP": 20 + 10 * np.random.random(n), 
                    "PRECIP": 200 + 100 * np.random.random(n),
                    "OTHER": np.random.randint(1, 100, n)})
weather = df.sort_values(["YEAR", "MONTH",  "REGION"]).reset_index(drop=True)
#print(df)


new_df = weather[(6 <= weather["MONTH"]) & (weather["MONTH"] <= 8)].groupby(["YEAR", "REGION"])[["TEMP", "PRECIP"]].mean()
print(new_df)