dplyr使用自定义函数进行变异返回矩阵而不是向量

时间:2018-01-29 20:44:14

标签: r dplyr

我在dplyr的mutate中使用自定义函数,并且得到了矩阵而不是向量的意外结果。我想取一个或多个列并根据自定义函数对其进行转换。

#Dummy data:
require(tidyverse)
set.seed(101)
dummy_data <- tibble(mfc = 1:15,value1 = runif(15),value2 = runif(15))

期望的输出:

> dummy_data %>% mutate_at(vars(value1,value2),funs(trans = mutate_transform(.)))

    # A tibble: 15 x 5
     mfc     value1     value2 trans_value1 trans_value2
   <int>      <dbl>      <dbl>        <dbl>        <dbl>
 1     1 0.37219838 0.59031973  -0.68738175    0.2211214
 2     2 0.04382482 0.82043609  -2.01936309    0.9903636
 3     3 0.70968402 0.22411848   0.68156092   -1.0030308
 4     4 0.65769040 0.41166683   0.47065924   -0.3760867
 5     5 0.24985572 0.03861056  -1.18364013   -1.6231541
 6     6 0.30005483 0.70071155  -0.98001753    0.5901436
 7     7 0.58486663 0.95683746   0.17526426    1.4463316
 8     8 0.33346714 0.21335200  -0.84448721   -1.0390214
 9     9 0.62201196 0.66106150   0.32593685    0.4575998
10    10 0.54582855 0.92331888   0.01691417    1.3342843
11    11 0.87979573 0.79571976   1.37158488    0.9077409
12    12 0.70687474 0.07121255   0.67016565   -1.5141709
13    13 0.73197259 0.38940777   0.77197005   -0.4504952
14    14 0.93163443 0.40645122   1.58185814   -0.3935216
15    15 0.45512059 0.65935508  -0.35102444    0.4518955

我得到了什么:

   # A tibble: 15 x 5
     mfc     value1     value2           value1_trans           value2_trans
   <int>      <dbl>      <dbl>           <data.frame>           <data.frame>
 1     1 0.37219838 0.59031973 <data.frame [15 x 15]> <data.frame [15 x 15]>
 2     2 0.04382482 0.82043609 <data.frame [15 x 15]> <data.frame [15 x 15]>
 3     3 0.70968402 0.22411848 <data.frame [15 x 15]> <data.frame [15 x 15]>
 4     4 0.65769040 0.41166683 <data.frame [15 x 15]> <data.frame [15 x 15]>
 5     5 0.24985572 0.03861056 <data.frame [15 x 15]> <data.frame [15 x 15]>
 6     6 0.30005483 0.70071155 <data.frame [15 x 15]> <data.frame [15 x 15]>
 7     7 0.58486663 0.95683746 <data.frame [15 x 15]> <data.frame [15 x 15]>
 8     8 0.33346714 0.21335200 <data.frame [15 x 15]> <data.frame [15 x 15]>
 9     9 0.62201196 0.66106150 <data.frame [15 x 15]> <data.frame [15 x 15]>
10    10 0.54582855 0.92331888 <data.frame [15 x 15]> <data.frame [15 x 15]>
11    11 0.87979573 0.79571976 <data.frame [15 x 15]> <data.frame [15 x 15]>
12    12 0.70687474 0.07121255 <data.frame [15 x 15]> <data.frame [15 x 15]>
13    13 0.73197259 0.38940777 <data.frame [15 x 15]> <data.frame [15 x 15]>
14    14 0.93163443 0.40645122 <data.frame [15 x 15]> <data.frame [15 x 15]>
15    15 0.45512059 0.65935508 <data.frame [15 x 15]> <data.frame [15 x 15]>

这是我的自定义功能:

  mutate_transform <- function(x){
  require(caret)
  trans <-  preProcess(data.frame(x), c("BoxCox", "center", "scale"))
  data_trans <-  data.frame(trans = predict(trans, data.frame(x)))
  return(data_trans)
}

我使用mutate错误还是应该更改自定义函数mutate_transform

2 个答案:

答案 0 :(得分:1)

您的自定义函数应该只返回一个简单的向量,而不是data.frame。例如

[
        {
            "end_year": "",
            "intensity": 6,
            "sector": "Energy",
            "topic": "gas",
            "insight": "Annual Energy Outlook",
            "url": "http://www.eia.gov/outlooks/aeo/pdf/0383(2017).pdf",
            "region": "Northern America",
            "start_year": "",
            "impact": "",
            "added": "January, 20 2017 03:51:25",
            "published": "January, 09 2017 00:00:00",
            "country": "United States of America",
            "relevance": 2,
            "pestle": "Industries",
            "source": "EIA",
            "title": "U.S. natural gas consumption is expected to increase during much of the projection period.",
            "likelihood": 3
        },
        {
            "end_year": "",
            "intensity": 6,
            "sector": "Energy",
            "topic": "oil",
            "insight": "Annual Energy Outlook",
            "url": "http://www.eia.gov/outlooks/aeo/pdf/0383(2017).pdf",
            "region": "Northern America",
            "start_year": "",
            "impact": "",
            "added": "January, 20 2017 03:51:24",
            "published": "January, 09 2017 00:00:00",
            "country": "United States of America",
            "relevance": 2,
            "pestle": "Industries",
            "source": "EIA",
            "title": "Reference case U.S. crude oil production is projected to recover from recent declines.",
            "likelihood": 3
        }
]

答案 1 :(得分:0)

为什么不在列上调用lapply函数:

dummy_data[, c("trans_value1", " trans_value2")] <- lapply(dummy_data[,2:3], mutate_transform)