熊猫Pivot_table

时间:2018-09-06 20:50:47

标签: python pandas scipy pivot-table

在Pandas数据框中有10分钟的风向和速度数据。看起来像这样:

      year  month  day  hour  minutes  direction  speed        filename
0   1999.0      1    1     0        0       84.0    7.1  mlrf1c1999.txt
1   1999.0      1    1     0       10       75.0    7.5  mlrf1c1999.txt
2   1999.0      1    1     0       20       79.0    7.2  mlrf1c1999.txt
3   1999.0      1    1     0       30       77.0    7.2  mlrf1c1999.txt
4   1999.0      1    1     0       40       76.0    6.7  mlrf1c1999.txt
5   1999.0      1    1     0       50       76.0    7.5  mlrf1c1999.txt
6   1999.0      1    1     1        0       81.0    6.9  mlrf1c1999.txt
7   1999.0      1    1     1       10       75.0    7.3  mlrf1c1999.txt
8   1999.0      1    1     1       20       77.0    7.4  mlrf1c1999.txt
9   1999.0      1    1     1       30       73.0    6.9  mlrf1c1999.txt
10  1999.0      1    1     1       40       78.0    6.5  mlrf1c1999.txt
11  1999.0      1    1     1       50       75.0    7.3  mlrf1c1999.txt
...
1147812  1997.0     12   31    21        0      261.0    6.0  mlrf1c1997.txt
1147813  1997.0     12   31    21       10      260.0    5.9  mlrf1c1997.txt
1147814  1997.0     12   31    21       20      262.0    5.5  mlrf1c1997.txt
1147815  1997.0     12   31    21       30      279.0    6.5  mlrf1c1997.txt
1147816  1997.0     12   31    21       40      283.0    7.3  mlrf1c1997.txt
1147817  1997.0     12   31    21       50      282.0    7.2  mlrf1c1997.txt
1147818  1997.0     12   31    22        0      277.0    6.9  mlrf1c1997.txt
1147819  1997.0     12   31    22       10      283.0    7.6  mlrf1c1997.txt
1147820  1997.0     12   31    22       20      283.0    7.2  mlrf1c1997.txt
1147821  1997.0     12   31    22       30      290.0    7.5  mlrf1c1997.txt
1147822  1997.0     12   31    22       40      289.0    7.2  mlrf1c1997.txt
1147823  1997.0     12   31    22       50      292.0    7.6  mlrf1c1997.txt
1147824  1997.0     12   31    23        0      296.0    7.7  mlrf1c1997.txt

我正在尝试使用数据透视表检查数据,以便我可以每小时获取平均方向和速度。我需要将Scipy的circmean函数应用于定向数据。这需要为数据集指定高和低参数。当我尝试这样做时,出现TypeError:'numpy.float64'对象不可调用。

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean(df.direction, high=df.direction.max(), low=df.direction.min()))

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean(df.direction, high=360, low=0))

据我了解,circmean需要高低的参数才能获得准确的输出。当我使用np.mean尝试获取风速读数的平均值时,我没有困难:

df.pivot_table(values = ['speed'], index = ['day', 'hour'], aggfunc = np.mean)

哪种产量:

             speed
day hour          
1   0     6.085055
    1     6.144919
    2     6.253006
    3     6.315291
    4     6.305656
    5     6.241176
    6     6.205701

我也可以不带参数应用circmean函数,就像这样:

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean)

执行此操作时,会得到无法解释的结果(即它们不是360度):

          direction
day hour           
1   0      2.992024
    1      3.414254
    2      1.620715
    3      0.463309
    4      6.206874
    5      1.451950
    6      4.319550

有没有办法在数据透视表的aggfunc参数中应用函数和参数?如果没有,是否有人建议我如何从数据框中获取所需的通告?

1 个答案:

答案 0 :(得分:0)

以下是一些复制您问题的代码:

import io
import pandas as pd
from scipy.stats import circmean

doc = """      year  month  day  hour  minutes  direction  speed        filename
0   1999.0      1    1     0        0       84.0    7.1  mlrf1c1999.txt
1   1999.0      1    1     0       10       75.0    7.5  mlrf1c1999.txt
2   1999.0      1    1     0       20       79.0    7.2  mlrf1c1999.txt
3   1999.0      1    1     0       30       77.0    7.2  mlrf1c1999.txt
4   1999.0      1    1     0       40       76.0    6.7  mlrf1c1999.txt
5   1999.0      1    1     0       50       76.0    7.5  mlrf1c1999.txt
6   1999.0      1    1     1        0       81.0    6.9  mlrf1c1999.txt
7   1999.0      1    1     1       10       75.0    7.3  mlrf1c1999.txt
8   1999.0      1    1     1       20       77.0    7.4  mlrf1c1999.txt
9   1999.0      1    1     1       30       73.0    6.9  mlrf1c1999.txt
10  1999.0      1    1     1       40       78.0    6.5  mlrf1c1999.txt
11  1999.0      1    1     1       50       75.0    7.3  mlrf1c1999.txt
1147812  1997.0     12   31    21        0      261.0    6.0  mlrf1c1997.txt
1147813  1997.0     12   31    21       10      260.0    5.9  mlrf1c1997.txt
1147814  1997.0     12   31    21       20      262.0    5.5  mlrf1c1997.txt
1147815  1997.0     12   31    21       30      279.0    6.5  mlrf1c1997.txt
1147816  1997.0     12   31    21       40      283.0    7.3  mlrf1c1997.txt
1147817  1997.0     12   31    21       50      282.0    7.2  mlrf1c1997.txt
1147818  1997.0     12   31    22        0      277.0    6.9  mlrf1c1997.txt
1147819  1997.0     12   31    22       10      283.0    7.6  mlrf1c1997.txt
1147820  1997.0     12   31    22       20      283.0    7.2  mlrf1c1997.txt
1147821  1997.0     12   31    22       30      290.0    7.5  mlrf1c1997.txt
1147822  1997.0     12   31    22       40      289.0    7.2  mlrf1c1997.txt
1147823  1997.0     12   31    22       50      292.0    7.6  mlrf1c1997.txt
1147824  1997.0     12   31    23        0      296.0    7.7  mlrf1c1997.txt"""    

df = pd.read_csv(io.StringIO(doc), sep='\s+')

脾气暴躁的笔记:在一个更好的问题中,上面的代码可能有问题, 花费了一些不必要的练习和时间来复制答案。 有关详细信息,请参见https://stackoverflow.com/help/mcve

# Now you need a function accepting an arguement for `aggfunc`

def avg(x):
    # x will be a pd.Series, equalling df.direction
    return circmean(x, high=x.max(), low=x.min())

# just to learn how it works with 'mean'
df2 = df.pivot_table(values='direction', index=['day', 'hour'], aggfunc = 'mean')

# now putting the desired function
df3 = df.pivot_table(values='direction', index=['day', 'hour'], aggfunc = avg)

有一个警告,但我希望您知道要处理(也许您想转换 avg内的弧度度:

  

RuntimeWarning:true_divide中遇到无效的值    ang =(样本-低)* 2 * pi /(高-低)

希望有帮助。