我在dplyr的mutate中使用自定义函数,并且得到了矩阵而不是向量的意外结果。我想取一个或多个列并根据自定义函数对其进行转换。
#Dummy data:
require(tidyverse)
set.seed(101)
dummy_data <- tibble(mfc = 1:15,value1 = runif(15),value2 = runif(15))
期望的输出:
> dummy_data %>% mutate_at(vars(value1,value2),funs(trans = mutate_transform(.)))
# A tibble: 15 x 5
mfc value1 value2 trans_value1 trans_value2
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0.37219838 0.59031973 -0.68738175 0.2211214
2 2 0.04382482 0.82043609 -2.01936309 0.9903636
3 3 0.70968402 0.22411848 0.68156092 -1.0030308
4 4 0.65769040 0.41166683 0.47065924 -0.3760867
5 5 0.24985572 0.03861056 -1.18364013 -1.6231541
6 6 0.30005483 0.70071155 -0.98001753 0.5901436
7 7 0.58486663 0.95683746 0.17526426 1.4463316
8 8 0.33346714 0.21335200 -0.84448721 -1.0390214
9 9 0.62201196 0.66106150 0.32593685 0.4575998
10 10 0.54582855 0.92331888 0.01691417 1.3342843
11 11 0.87979573 0.79571976 1.37158488 0.9077409
12 12 0.70687474 0.07121255 0.67016565 -1.5141709
13 13 0.73197259 0.38940777 0.77197005 -0.4504952
14 14 0.93163443 0.40645122 1.58185814 -0.3935216
15 15 0.45512059 0.65935508 -0.35102444 0.4518955
我得到了什么:
# A tibble: 15 x 5
mfc value1 value2 value1_trans value2_trans
<int> <dbl> <dbl> <data.frame> <data.frame>
1 1 0.37219838 0.59031973 <data.frame [15 x 15]> <data.frame [15 x 15]>
2 2 0.04382482 0.82043609 <data.frame [15 x 15]> <data.frame [15 x 15]>
3 3 0.70968402 0.22411848 <data.frame [15 x 15]> <data.frame [15 x 15]>
4 4 0.65769040 0.41166683 <data.frame [15 x 15]> <data.frame [15 x 15]>
5 5 0.24985572 0.03861056 <data.frame [15 x 15]> <data.frame [15 x 15]>
6 6 0.30005483 0.70071155 <data.frame [15 x 15]> <data.frame [15 x 15]>
7 7 0.58486663 0.95683746 <data.frame [15 x 15]> <data.frame [15 x 15]>
8 8 0.33346714 0.21335200 <data.frame [15 x 15]> <data.frame [15 x 15]>
9 9 0.62201196 0.66106150 <data.frame [15 x 15]> <data.frame [15 x 15]>
10 10 0.54582855 0.92331888 <data.frame [15 x 15]> <data.frame [15 x 15]>
11 11 0.87979573 0.79571976 <data.frame [15 x 15]> <data.frame [15 x 15]>
12 12 0.70687474 0.07121255 <data.frame [15 x 15]> <data.frame [15 x 15]>
13 13 0.73197259 0.38940777 <data.frame [15 x 15]> <data.frame [15 x 15]>
14 14 0.93163443 0.40645122 <data.frame [15 x 15]> <data.frame [15 x 15]>
15 15 0.45512059 0.65935508 <data.frame [15 x 15]> <data.frame [15 x 15]>
这是我的自定义功能:
mutate_transform <- function(x){
require(caret)
trans <- preProcess(data.frame(x), c("BoxCox", "center", "scale"))
data_trans <- data.frame(trans = predict(trans, data.frame(x)))
return(data_trans)
}
我使用mutate
错误还是应该更改自定义函数mutate_transform
?
答案 0 :(得分:1)
您的自定义函数应该只返回一个简单的向量,而不是data.frame。例如
[
{
"end_year": "",
"intensity": 6,
"sector": "Energy",
"topic": "gas",
"insight": "Annual Energy Outlook",
"url": "http://www.eia.gov/outlooks/aeo/pdf/0383(2017).pdf",
"region": "Northern America",
"start_year": "",
"impact": "",
"added": "January, 20 2017 03:51:25",
"published": "January, 09 2017 00:00:00",
"country": "United States of America",
"relevance": 2,
"pestle": "Industries",
"source": "EIA",
"title": "U.S. natural gas consumption is expected to increase during much of the projection period.",
"likelihood": 3
},
{
"end_year": "",
"intensity": 6,
"sector": "Energy",
"topic": "oil",
"insight": "Annual Energy Outlook",
"url": "http://www.eia.gov/outlooks/aeo/pdf/0383(2017).pdf",
"region": "Northern America",
"start_year": "",
"impact": "",
"added": "January, 20 2017 03:51:24",
"published": "January, 09 2017 00:00:00",
"country": "United States of America",
"relevance": 2,
"pestle": "Industries",
"source": "EIA",
"title": "Reference case U.S. crude oil production is projected to recover from recent declines.",
"likelihood": 3
}
]
答案 1 :(得分:0)
为什么不在列上调用lapply函数:
dummy_data[, c("trans_value1", " trans_value2")] <- lapply(dummy_data[,2:3], mutate_transform)