Site Parameter Date (LST) Year Month Day Hour Value Unit Duration QC Name
1 Beijing PM2.5 2017-01-01 00:00:00 2017 1 1 0 505 µg/m_ 1 Hr Valid
2 Beijing PM2.5 2017-01-01 01:00:00 2017 1 1 1 485 µg/m_ 1 Hr Valid
3 Beijing PM2.5 2017-01-01 02:00:00 2017 1 1 2 466 µg/m_ 1 Hr Valid
4 Beijing PM2.5 2017-01-01 03:00:00 2017 1 1 3 435 µg/m_ 1 Hr Valid
5 Beijing PM2.5 2017-01-01 04:00:00 2017 1 1 4 405 µg/m_ 1 Hr Valid
6 Beijing PM2.5 2017-01-01 05:00:00 2017 1 1 5 402 µg/m_ 1 Hr Valid
7 Beijing PM2.5 2017-01-01 06:00:00 2017 1 1 6 407 µg/m_ 1 Hr Valid
8 Beijing PM2.5 2017-01-01 07:00:00 2017 1 1 7 435 µg/m_ 1 Hr Valid
9 Beijing PM2.5 2017-01-01 08:00:00 2017 1 1 8 472 µg/m_ 1 Hr Valid
10 Beijing PM2.5 2017-01-01 09:00:00 2017 1 1 9 465 µg/m_ 1 Hr Valid
11 Beijing PM2.5 2017-01-01 10:00:00 2017 1 1 10 473 µg/m_ 1 Hr Valid
12 Beijing PM2.5 2017-01-01 11:00:00 2017 1 1 11 456 µg/m_ 1 Hr Valid
13 Beijing PM2.5 2017-01-01 12:00:00 2017 1 1 12 474 µg/m_ 1 Hr Valid
14 Beijing PM2.5 2017-01-01 13:00:00 2017 1 1 13 510 µg/m_ 1 Hr Valid
15 Beijing PM2.5 2017-01-01 14:00:00 2017 1 1 14 596 µg/m_ 1 Hr Valid
16 Beijing PM2.5 2017-01-01 15:00:00 2017 1 1 15 580 µg/m_ 1 Hr Valid
17 Beijing PM2.5 2017-01-01 16:00:00 2017 1 1 16 556 µg/m_ 1 Hr Valid
18 Beijing PM2.5 2017-01-01 17:00:00 2017 1 1 17 522 µg/m_ 1 Hr Valid
19 Beijing PM2.5 2017-01-01 18:00:00 2017 1 1 18 495 µg/m_ 1 Hr Valid
20 Beijing PM2.5 2017-01-01 19:00:00 2017 1 1 19 500 µg/m_ 1 Hr Valid
21 Beijing PM2.5 2017-01-01 20:00:00 2017 1 1 20 484 µg/m_ 1 Hr Valid
22 Beijing PM2.5 2017-01-01 21:00:00 2017 1 1 21 452 µg/m_ 1 Hr Valid
23 Beijing PM2.5 2017-01-01 22:00:00 2017 1 1 22 427 µg/m_ 1 Hr Valid
24 Beijing PM2.5 2017-01-01 23:00:00 2017 1 1 23 444 µg/m_ 1 Hr Valid
25 Beijing PM2.5 2017-01-02 00:00:00 2017 1 2 0 428 µg/m_ 1 Hr Valid
26 Beijing PM2.5 2017-01-02 01:00:00 2017 1 2 1 466 µg/m_ 1 Hr Valid
27 Beijing PM2.5 2017-01-02 02:00:00 2017 1 2 2 452 µg/m_ 1 Hr Valid
28 Beijing PM2.5 2017-01-02 03:00:00 2017 1 2 3 442 µg/m_ 1 Hr Valid
29 Beijing PM2.5 2017-01-02 04:00:00 2017 1 2 4 390 µg/m_ 1 Hr Valid
30 Beijing PM2.5 2017-01-02 05:00:00 2017 1 2 5 317 µg/m_ 1 Hr Valid
如何从显示的所有列中显示的那个(截断的)创建一个新的DataFrame,但是不是按小时显示值,而是显示当天的平均值?
答案 0 :(得分:1)
您可以尝试:
import datetime from datetime
df['Dates'] = df['Date (LST)'].dt.date
df['hour_average'] = df.groupby(['Dates'])['Hour'].transform('mean')
答案 1 :(得分:1)
这是一个非常基本的split-apply-combine problem。但是,作为环境数据,我可以帮助您解决一些细微差别。
据推测,您的完整数据集在多个站点上测量了多个参数,因此您需要按这些参数进行分组。由于您的日期已经解析为其组件,我们可能会使用它们来获取每日值。
作为每天使用此类环境数据的人,您也总是希望按单位分组。虽然单位在此数据集中是一致的,但您最终会遇到具有一致单位的数据集。养成在小组中加入单位的习惯可以帮助你发现这些错误。
让我们读一下您的数据:
from io import StringIO
import pandas
datafile = StringIO("""\
Site Parameter "Date (LST)" Year Month Day Hour Value Unit Duration QC Name
Beijing PM2.5 "2017-01-01 00:00:00" 2017 1 1 0 505 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 01:00:00" 2017 1 1 1 485 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 02:00:00" 2017 1 1 2 466 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 03:00:00" 2017 1 1 3 435 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 04:00:00" 2017 1 1 4 405 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 05:00:00" 2017 1 1 5 402 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 06:00:00" 2017 1 1 6 407 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 07:00:00" 2017 1 1 7 435 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 08:00:00" 2017 1 1 8 472 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 09:00:00" 2017 1 1 9 465 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 10:00:00" 2017 1 1 10 473 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 11:00:00" 2017 1 1 11 456 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 12:00:00" 2017 1 1 12 474 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 13:00:00" 2017 1 1 13 510 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 14:00:00" 2017 1 1 14 596 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 15:00:00" 2017 1 1 15 580 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 16:00:00" 2017 1 1 16 556 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 17:00:00" 2017 1 1 17 522 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 18:00:00" 2017 1 1 18 495 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 19:00:00" 2017 1 1 19 500 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 20:00:00" 2017 1 1 20 484 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 21:00:00" 2017 1 1 21 452 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 22:00:00" 2017 1 1 22 427 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-01 23:00:00" 2017 1 1 23 444 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 00:00:00" 2017 1 2 0 428 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 01:00:00" 2017 1 2 1 466 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 02:00:00" 2017 1 2 2 452 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 03:00:00" 2017 1 2 3 442 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 04:00:00" 2017 1 2 4 390 µg/m_ 1 Hr Valid
Beijing PM2.5 "2017-01-02 05:00:00" 2017 1 2 5 317 µg/m_ 1 Hr Valid
""")
df = pandas.read_csv(datafile, sep='\s+', parse_dates=['Date (LST)'])
然后按定义site-parameter-unit-day的所有列进行分组,选择“Value”列,然后取平均值。
group_cols = ['Site', 'Parameter', 'Unit', 'Year', 'Month', 'Day']
df.groupby(by=group_cols)['Value'].mean()
这就是:
Site Parameter Unit Year Month Day
Beijing PM2.5 µg/m_ 2017 1 1 476.916667
2 415.833333
在group by语句中包含site,parameter和units意味着上面的简单语句可以扩展到包含任意数量的站点和参数的数据集。
答案 2 :(得分:-1)
我相信您正在寻找pandas.DataFrame.mean().
使用示例:
import pandas as pd
data = ["Beijing","PM2.5","2017-01-01","2017","1",df["Value"].mean(), 'ug/m_', '1 Day', 'Valid']
averages = pd.DataFrame(data, columns=["Site", "Parameter", "Date", "Year", "Month", "Day", "Value", "Unit", "Duration", "QC Name"])
请记住,我根据您获取信息的方式对值进行了硬编码,可能有更好的方法来导入标头和值。但是这应该显示如何使用df.mean()