自定义函数计算平均绝对偏差

时间:2020-09-21 15:26:25

标签: python numpy multidimensional-array numpy-ndarray

我有一个类似于以下内容的4D numpy数组:

>>>import numpy as np
>>>from functools import partial

>>>X = np.random.rand(20, 1, 10, 4)

>>>X.shape
(20, 1, 10, 4)

我计算以下统计信息mean, median, std, p25, p75

>>>percentiles = tuple(partial(np.percentile, q=q) for q in (25,75))
>>>stat_functions = (np.mean, np.std, np.median) + percentiles

>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

因此:

>>>stats.shape
(20, 1, 5, 4)

>>>stats[0]
array([[[0.55187202, 0.55892688, 0.45816177, 0.6378181 ],
        [0.31028278, 0.32109677, 0.17319351, 0.13341651],
        [0.57112019, 0.60587194, 0.45490572, 0.59787335],
        [0.30857011, 0.30367621, 0.28899686, 0.55742753],
        [0.80678815, 0.82014851, 0.61295181, 0.70529412]]])

我对统计数据中的mad感兴趣,因此我定义了此函数,因为numpy无法使用此函数。

def mad(data):
    mean = np.mean(data)
    f = lambda x: abs(x - mean)
    vf = np.vectorize(f)
    return (np.add.reduce(vf(data))) / len(data)

但是我在使此功能起作用时遇到了问题:首先,我尝试过:

>>>stat_functions = (np.mean, np.std, np.median, mad) + percentiles
>>>stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-33-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<ipython-input-33-fa6d972f0fce> in <listcomp>(.0)
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

TypeError: mad() got an unexpected keyword argument 'axis'

然后我将mad的定义修改为:

def mad(data, axis=None):
    ...

关注此问题:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<ipython-input-35-c74d9e3d057b> in <listcomp>(.0)
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

TypeError: mad() got an unexpected keyword argument 'keepdims'

也这样做:

def mad(data, axis=None, keepdims=None):
    ...

让我进入:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-c74d9e3d057b> in <module>()
----> 1 stats = np.concatenate([f(X, axis=2, keepdims=True) for f in my_func], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

我知道这与尺寸问题有关,但是我不确定在这种情况下如何解决它。

* 编辑:

根据给出的答案,使用答案的mad函数后,我得到了一个奇怪的结果,如下所示:

stat_functions = (np.mean, np.std, np.median,mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

stats.shape
(20, 1, 15, 4)

预期输出应为(20,1,6,4)形状,因为我要在第三个维度中向其中添加一个统计值:(np.mean, np.std, np.median, mad) + percentiles

EDIT-2

从答案中使用此功能:

def mad(data, axis=-1, keepdims=True):
    return np.abs(data - data.mean(axis, keepdims=True)).mean(axis)

然后:

stat_functions = (np.mean, np.std, np.median, mad) + percentiles

stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

然后碰到这个:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-fa6d972f0fce> in <module>()
----> 1 stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 4 dimension(s) and the array at index 3 has 3 dimension(s)

1 个答案:

答案 0 :(得分:0)

我在代码{ "timestamp": "2020-09-21T15:34:41.367+00:00", "status": 400, "error": "Bad Request", "trace": "org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: Cannot deserialize instance of `[Ljava.lang.String;` out of START_OBJECT token; nested exception is com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `[Ljava.lang.String;` out of START_OBJECT token\n at [Source: (PushbackInputStream); line: 1, column: 1]\r\n\tat org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.readJavaType(AbstractJackson2HttpMessageConverter.java:275)\r\n\tat org.springframework.http.converter.json.AbstractJackson2HttpMessageConverter.read(AbstractJackson2HttpMessageConverter.java:257)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodArgumentResolver.readWithMessageConverters(AbstractMessageConverterMethodArgumentResolver.java:205)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestResponseBodyMethodProcessor.readWithMessageConverters(RequestResponseBodyMethodProcessor.java:158)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestResponseBodyMethodProcessor.resolveArgument(RequestResponseBodyMethodProcessor.java:131)\r\n\tat org.springframework.web.method.support.HandlerMethodArgumentResolverComposite.resolveArgument(HandlerMethodArgumentResolverComposite.java:121)\r\n\tat org.springframework.web.method.support.InvocableHandlerMethod.getMethodArgumentValues(InvocableHandlerMethod.java:167)\r\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:134)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:105)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:879)\r\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:793)\r\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)\r\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040)\r\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943)\r\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)\r\n\tat org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:660)\r\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)\r\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:741)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)\r\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)\r\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)\r\n\tat org.spri 中注意到的第一件事绝不是向量化功能(请参见Numpy's doc中的注释。您可以使用vf代替np.abs,您的函数将被矢量化。

也就是说,您的函数可以写为:

abs

现在,请注意,此def mad(data): return np.abs(data - data.mean(0))/ len(data) 函数或您的原始函数仅接受一个位置参数和可选参数。您收到的错误是因为您试图将mad传递给axis=2

mad

要解决此问题,请使用可选参数构建函数:

[func(X, axis=2, keepdims=True) for func in stat_functions]

或者使用def mad(data, axis=-1, keepdims=True): return np.abs(data - data.mean(axis, keepdims=keepdims)).sum(axis)/len(data) 比使用mean(axis)

更有意义
sum(axis)/len(data)
相关问题