熊猫DatetimeIndex TypeError

时间:2020-09-19 10:27:55

标签: python pandas dataframe datetime

我尝试执行此处的操作:Pandas resampling with custom volume weighted aggregation,但我的索引遇到TypeError。

我有类似的数据:

#include <stdio.h>
#include <stdlib.h>

void rev_string(char *s)
{    
        int length = _strlen(s);
        char *clone = s;
        char *tmp = malloc(length * sizeof(char));

        tmp += length;

        while (*clone)
        {
                *tmp = *clone;
                tmp--;
                clone++;
        }

        tmp += 1;

        //so if you print this and check it you can see the reversing was successful
        //but I couldnt assign it back to the main string;
        printf("TMP-->%s,s-->%s\n",tmp,s);
        
        //I have debugged the problem to be in the next while loop
        while (*tmp)
        {
                *s = *tmp;
                tmp ++;
                s++;
       }    
}

int main(void)
{
        char *str;
        str = "five";
        rev_string(str);
        printf("done");
        printf("%s\n",str);
        return (0);
}

我使用 Dates P Q 0 2020-09-07 01:20:24.738686 7175.0 21 1 2020-09-07 01:45:27.540590 7150.0 7 2 2020-09-07 03:48:49.120607 7125.0 4 3 2020-09-07 04:45:50.972042 7125.0 6 4 2020-09-07 05:36:23.139612 7125.0 2 检查类型,该类型返回:

print(df.dtypes)

然后我使用以下命令将索引设置为日期 Dates datetime64[ns] P float64 Q int64 dtype: object

然后,我将“日期”列删除以使其更易于使用df = df.set_index(pd.DatetimeIndex(df['Dates']))

进行阅读。

这给了我

df = df.drop(['Dates'], axis=1)

然后我尝试重新采样:

                                 P   Q
Dates                                 
2020-09-07 01:20:24.738686  7175.0  21
2020-09-07 01:45:27.540590  7150.0   7
2020-09-07 03:48:49.120607  7125.0   4
2020-09-07 04:45:50.972042  7125.0   6
2020-09-07 05:36:23.139612  7125.0   2

这将导致错误def vwap(data): price = data.P quantity = data.Q top = sum(price * quantity) bottom = sum(quantity) return top / bottom df2 = df.resample("5h",axis=1).apply(vwap)

查看具有类似名称的其他堆栈溢出条目,它们的问题主要是datetime看起来仅像datetime,但实际上并未格式化为datetime。情况并非如此,因为我们在前面看到的“日期”列的类型为TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

此外,如果我执行datetime64[ns],则会得到:

print(df.index.dtype)

有什么建议吗?很高兴澄清任何事情或提供更多代码(如果有帮助的话)。

1 个答案:

答案 0 :(得分:3)

删除axis=1参数并使用pd.Grouper可以做到:

df.groupby(pd.Grouper(freq="5h")).apply(vwap)
Dates
2020-09-07 00:00:00    7157.236842
2020-09-07 05:00:00    7125.000000
dtype: float64

如果您想要一个具有信息列名称的数据框,请使用reset_index

df.groupby(pd.Grouper(freq="5h")).apply(vwap).reset_index(name="vwap")
                Dates         vwap
0 2020-09-07 00:00:00  7157.236842
1 2020-09-07 05:00:00  7125.000000