在某些行号熊猫数据框中找到值的平均值

时间:2020-06-26 14:36:59

标签: python pandas numpy dataframe

我有一个这样的数据框。

    Date    Daily Risk Score
0   2020-06-26  6.0
1   2020-06-27  6.0
2   2020-06-28  6.0
3   2020-06-29  6.0
4   2020-06-30  6.0
5   2020-07-01  6.0
6   2020-07-02  6.0
7   2020-07-03  6.0
8   2020-07-04  6.0
9   2020-07-05  6.0
10  2020-07-06  6.0
11  2020-07-07  6.0
12  2020-07-08  6.0
13  2020-07-09  6.0
14  2020-06-26  6.0
15  2020-06-27  6.0
16  2020-06-28  6.0
17  2020-06-29  6.0
18  2020-06-30  6.0
19  2020-07-01  6.0
20  2020-07-02  6.0
21  2020-07-03  6.0
22  2020-07-04  6.0
23  2020-07-05  6.0
24  2020-07-06  6.0
25  2020-07-07  6.0
26  2020-07-08  6.0
27  2020-07-09  6.0

我想取整个数据帧(超过50k个条目)中所有相似日期的平均值。如何遍历每个日期,然后在末尾创建一列以列出14个值(对应于每天的平均值)?

预期输出为:

  Date  Daily Risk Score  Mean
0   2020-06-26  6.0   a
1   2020-06-27  6.0   b
2   2020-06-28  6.0   c 
3   2020-06-29  6.0   ...
4   2020-06-30  6.0
5   2020-07-01  6.0
6   2020-07-02  6.0
7   2020-07-03  6.0
8   2020-07-04  6.0
9   2020-07-05  6.0
10  2020-07-06  6.0
11  2020-07-07  6.0
12  2020-07-08  6.0
13  2020-07-09  6.0
14  2020-06-26  6.0
15  2020-06-27  6.0
16  2020-06-28  6.0
17  2020-06-29  6.0
18  2020-06-30  6.0
19  2020-07-01  6.0
20  2020-07-02  6.0
21  2020-07-03  6.0
22  2020-07-04  6.0
23  2020-07-05  6.0
24  2020-07-06  6.0
25  2020-07-07  6.0
26  2020-07-08  6.0
27  2020-07-09  6.0

其中a表示6-26的所有每日风险评分的平均值。 B是6-27的平均值,以此类推。

3 个答案:

答案 0 :(得分:2)

这是一种基于numpy的方法,使用view_as_windows,步长为3,滚动显示列值。使用这种方法,如果不存在整个窗口,则省略输出。

from skimage.util import view_as_windows

a = df['Value'].to_numpy()
# strided view of a with a step size of 3
w = view_as_windows(a, len(a)//3, step=3)
# missing values not present in strided view (incomplete window)
missing = a[w.size:]
prev_means = w.mean(0)
# construct new array with missing values and means of w
# if no missing values, the mean is kept
prev_means[:len(missing)] = a[w.size:]
means = np.vstack([w, prev_means]).mean(0)
# new df column
new_col = np.full(len(a), np.nan)
new_col[:len(means)] = means
df['means'] = new_col

print(df)

    Value     means
0       1  3.000000 # (1+4+2+5)/4
1       2  4.000000 # (2+5+3+6)/4
2       3  2.666667 # (3+1+4)/3
3       4       NaN
4       5       NaN
5       1       NaN
6       2       NaN
7       3       NaN
8       4       NaN
9       5       NaN
10      6       NaN

答案 1 :(得分:1)

您可以将np.r_np.nanmean一起使用


org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'interceptor' defined in com.example.InterceptorApplication: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.example.Interceptor]: Factory method 'interceptor' threw exception; nested exception is org.aspectj.lang.NoAspectBoundException: Exception while initializing com.example.Interceptor: java.lang.NoSuchMethodException: com.example.Interceptor.aspectOf()
    at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:656) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:484) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1338) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1177) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:879) ~[spring-beans-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:878) ~[spring-context-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550) ~[spring-context-5.2.3.RELEASE.jar:5.2.3.RELEASE]
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141) ~[spring-boot-2.2.4.RELEASE.jar:2.2.4.RELEASE]
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:747) ~[spring-boot-2.2.4.RELEASE.jar:2.2.4.RELEASE]
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) ~[spring-boot-2.2.4.RELEASE.jar:2.2.4.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) ~[spring-boot-2.2.4.RELEASE.jar:2.2.4.RELEASE]
    at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:140) [spring-boot-2.2.4.RELEASE.jar:2.2.4.RELEASE]
    at com.example.InterceptorApplication.main(InterceptorApplication.java:27) [classes/:na]

详细信息

def mean_window(arr, s):
    l = len(arr)
    fill_values = (s - l%s) if l%s else 0
    return np.nanmean(np.r_[arr,[np.nan]*fill_values].reshape(-1,s),axis=0)

mean_window(df.Value.to_numpy, 3)
# array([3.        , 4.        , 2.66666667])

答案 2 :(得分:0)

df[::3]['Value'].mean()  

这将得到您想要的,但是您也希望将其分配给一列,您希望结果如何?