DataFrame按小时分组并计算特定列的均值

时间:2020-11-03 02:19:11

标签: python pandas dataframe pandas-groupby mean

考虑以下DataFrame

    server {
        disable_symlinks off;
        listen 8080;
        server_name pb.localhost;
        
        ssl_client_certificate /etc/ssl/certs/ca-certificates.crt;
        location /api {
            proxy_pass  https://my-api.com:443;

            proxy_ssl_server_name on;
                proxy_http_version  1.1;
                proxy_cache_bypass  $http_upgrade;

            proxy_set_header X-SSL-CERT $ssl_client_escaped_cert;
                proxy_set_header Upgrade           $http_upgrade;
                proxy_set_header Connection        "upgrade";
                proxy_set_header Host              $proxy_host;
                proxy_set_header X-Real-IP         $upstream_addr;
                proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_set_header X-Forwarded-Host  $host;
                proxy_set_header X-Forwarded-Port  $server_port;
        }

        location /some-path/ {
            disable_symlinks off;
            root /var/www;
            index index.html;
        }
    }

为了获得每小时(第三列)“太阳能”(第11列)的平均值,我尝试了

1。

    Year    Month   Day Hour    1   2   4   5   6   7   Solar
0   2019    01  01  00  3856    6074    2123    3634    2219    2449    29
1   2019    01  01  00  3856    6072    2038    3443    2376    2644    29
2   2019    01  01  00  3862    6074    1916    3341    2734    2522    29
3   2019    01  01  00  3815    6074    1882    3135    2880    2556    29
4   2019    01  01  00  3751    6073    1855    3055    2940    2651    30
5   2019    01  01  00  3763    6071    1844    2978    2907    2628    29
6   2019    01  01  01  3808    6072    1842    2898    2868    2557    29
7   2019    01  01  01  3799    6074    1743    3559    2838    1844    29
8   2019    01  01  01  3810    6073    1688    3305    2766    1958    29
9   2019    01  01  01  3798    6075    1696    3142    2645    2048    30
10  2019    01  01  01  3740    6072    1678    3096    2598    2056    29

“ Solar_Mean”将仅获得df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].mean()

nan
    Solar_Mean
0   nan
1   nan
2   nan
3   nan
4   nan
5   nan

哪个给

df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform('mean')
    Solar_Mean
0   272.4290164663996
1   272.4290164663996
2   272.4290164663996
3   272.4290164663996
4   272.4290164663996
5   272.4290164663996

与第二种方法相同。

df['Solar_Mean'] = df.groupby(['Hour'])['Solar'].transform(np.mean)

因为每小时有6个文件,所以如果一个文件取前6个文件的总和并将它们除以6,则一个文件将获得 Solar_Mean 0 272.4290164663996 1 272.4290164663996 2 272.4290164663996 3 272.4290164663996 4 272.4290164663996 5 272.4290164663996 ,该值应该是正确的值。我在这里想念什么?

1 个答案:

答案 0 :(得分:1)

使用groupby时我没有考虑swapcontextYearMonth。应该是这样的

Day

哪个给

df['Solar_Mean'] = df.groupby(['Year', 'Month', 'Day', 'Hour'])['Solar'].transform('mean')