使用dplyr
创建数据摘要时,我经常会发现自己正在计算CI(使用CI
中的Rmisc
):
summary <- data %>%
group_by(group1, group2) %>%
summarize(
var1.mean = CI(var1, ci=0.95)['mean'],
var1.lower = CI(var1, ci=0.95)['lower'],
var1.upper = CI(var1, ci=0.95)['upper'],
var2.mean = CI(var2, ci=0.95)['mean'],
var2.lower = CI(var2, ci=0.95)['lower'],
var3.upper = CI(var2, ci=0.95)['upper'],
var3.mean = CI(var3, ci=0.95)['mean'],
var3.lower = CI(var3, ci=0.95)['lower'],
var3.upper = CI(var3, ci=0.95)['upper'],
var4 = sum(var4)
)
这既痛苦又冗长,效率低下。最后,我希望我能写一些类似的东西:
summary <- data %>%
group_by(group1, group2) %>%
summarize(
var1 = CI(var1, ci=0.95),
var2 = CI(var2, ci=0.95),
var3 = CI(var3, ci=0.95),
var4 = sum(var4)
)
对于上面的代码,由于CI
返回带有行
"lower"
,"upper"
和"mean"
,我希望我能得到一个包含以下列的数据框:
"group1"
,"group2
”,"var1.lower"
,"var1.mean"
,"var1.upper"
,"var2.lower"
,"var3.upper"
,"var4"
。知道如何实现这一目标吗?有没有办法在R中“压平”列?像do.call
之类的东西,但像JS或Python一样应用于休息?
使用quasiquotations可能会有一些事情要做,但它开始超越我的R技能..
我过去常常使用this gist plyr
,但它不再适用于dplyr
,而不是再次重新编码,我希望有一种比黑客更好的方法库。
答案 0 :(得分:3)
如果我们将输出格式化为tidyr::unnest
第一个
data.frame
数据强>
library(Rmisc)
library(dplyr)
library(tidyr)
set.seed(1)
data <- data.frame(group1 = sample(c("A","B"),10,T),
group2 = sample(c("A","B"),10,T),
var1 = sample(10),
var2 = sample(10),
var3 = sample(10),
var4 = sample(10))
一般解决方案
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = list(data.frame(t(CI(var1, ci=0.95)))),
var2 = list(data.frame(t(CI(var2, ci=0.95)))),
var3 = list(data.frame(t(CI(var3, ci=0.95)))),
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
<强>结果强>
# A tibble: 4 x 12
# Groups: group1 [2]
# group1 group2 var4 var1.upper var1.mean var1.lower var2.upper var2.mean var2.lower var3.upper var3.mean var3.lower
# <fctr> <fctr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A A 13 56.824819 6.0 -44.824819 11.85310 5.500000 -0.8531024 26.55931 7.500000 -11.559307
# 2 A B 11 38.265512 6.5 -25.265512 50.97172 6.500000 -37.9717166 25.55931 6.500000 -12.559307
# 3 B A 11 12.956686 4.0 -4.956686 13.65205 5.666667 -2.3187188 15.07146 5.666667 -3.738127
# 4 B B 20 8.484138 6.0 3.515862 14.70619 4.666667 -5.3728564 11.31872 3.333333 -4.652052
或使用自定义CI功能(相同输出)
CI2 <- function(x,ci=0.95) list(data.frame(t(CI(x, ci))))
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = CI2(var1, ci=0.95),
var2 = CI2(var2, ci=0.95),
var3 = CI2(var3, ci=0.95),
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
或使用转换器功能(相同输出)
可以与任何其他返回数组的函数一起使用
vec2rowdf <- function(v) list(data.frame(t(v))) # creates a 1 row data.frame from a vector, wrapped in a list
data %>% group_by(group1, group2) %>%
dplyr::summarize(var1 = CI(var1, ci=0.95) %>% vec2rowdf,
var2 = CI(var2, ci=0.95) %>% vec2rowdf,
var3 = CI(var3, ci=0.95) %>% vec2rowdf,
var4 = sum(var4)) %>%
unnest (var1,var2,var3,.sep=".")
答案 1 :(得分:1)
“展平”由import { TestBed, inject } from '@angular/core/testing';
import { AngularFireAuth } from 'angularfire2/auth';
import 'rxjs/add/observable/of';
import { Observable } from 'rxjs/Rx';
import { AuthService } from './auth.service';
import { environment } from '../environments/environment';
describe('AuthService', () => {
const mockAngularFireAuth: any = {
auth: jasmine.createSpyObj('auth', {
'signInAnonymously': Promise.resolve('foo'),
// 'signInWithPopup': Promise.reject(),
// 'signOut': Promise.reject()
}),
authState: Observable.of(null)
};
beforeEach(() => {
TestBed.configureTestingModule({
providers: [
{ provide: AngularFireAuth, useValue: mockAngularFireAuth },
{ provide: AuthService, useClass: AuthService }
]
});
});
it('should be created', inject([ AuthService ], (service: AuthService) => {
expect(service).toBeTruthy();
}));
//
//
//
//
//
describe('when we can’t authenticate', () => {
beforeEach(() => {
mockAngularFireAuth.auth.signInAnonymously.and.returnValue(Promise.reject('bar'));
});
it('should thow', inject([ AuthService ], (service: AuthService) => {
expect(service).toThrow();
}));
});
//
//
//
//
//
});
(来自unnest
)处理。您可以在汇总时创建列表,然后使用tidyr
格式化,以便每个上/下/三元组显示为3行:
unnest
然后,您需要使用收集和传播
格式化您的tibble以获得所需的输出res <- data %>%
group_by(group1, group2) %>%
summarize_at(vars(var1, var2, var3), funs(list(CI(., ci=0.95)))) %>%
unnest(var1, var2, var3, .id = 'name')
很容易将其更改为使用res %>%
group_by(group1, group2) %>%
mutate(q = c('upper', 'mean', 'lower')) %>%
ungroup %>%
gather(var, val, var1, var2, var3) %>%
mutate(var = paste(var, q, sep = '.')) %>%
select(-q) %>%
spread(var, val)
中的其他函数(将不同的函数应用于需要从summarize_*
更改为summarize
的不同列,并明确写出每个列的表达式summarize_at
,var1
,var2
)
答案 2 :(得分:1)
这是另一种可能的方法,在总结var1-var4的结果后,按行使用unlist
:
library(dplyr)
summary <- dat %>%
group_by(group1, group2) %>%
summarize(
var1 = list(Rmisc::CI(var1)),
var2 = list(Rmisc::CI(var2)),
var3 = list(Rmisc::CI(var3)),
var4 = sum(var4)) %>%
ungroup()
summary <- t(apply(summary, 1, unlist)) %>% # unlist by row (this results in a character matrix)
as.data.frame(stringsAsFactors = F) %>% # convert matrix back to data frame
mutate_at(vars(var1.upper:var4), # convert appropriate columns back to numeric
as.numeric)
> str(summary)
'data.frame': 9 obs. of 12 variables:
$ group1 : chr "A" "A" "A" "B" ...
$ group2 : chr "d" "e" "f" "d" ...
$ var1.upper: num 5.77 6.25 5.94 6.49 6.26 ...
$ var1.mean : num 4.55 4.8 4.66 5.27 4.94 ...
$ var1.lower: num 3.32 3.35 3.38 4.04 3.62 ...
$ var2.upper: num 20.5 20.3 20.3 20.1 20.3 ...
$ var2.mean : num 20.2 19.9 19.9 19.7 19.9 ...
$ var2.lower: num 19.8 19.5 19.5 19.3 19.5 ...
$ var3.upper: num 140 155 120 113 141 ...
$ var3.mean : num 113.9 125.3 94.7 86.3 111.6 ...
$ var3.lower: num 88.1 95.6 69.9 59.8 82.7 ...
$ var4 : num 240042 205052 215986 231008 229010 ...
使用的样本数据:
set.seed(123)
n = 2000
dat <- data.frame(
group1 = sample(LETTERS[1:3], n, replace = T),
group2 = sample(letters[4:6], n, replace = T),
var1 = rnorm(n, mean = 5, sd = 10),
var2 = rnorm(n, mean = 20, sd = 3),
var3 = rnorm(n, mean = 100, sd = 200),
var4 = rnorm(n, mean = 1000, sd = 5)
)
答案 3 :(得分:1)
以下是使用tidyverse
工具的一些更简洁的解决方案
set.seed(1)
data <- data.frame(group1 = sample(c("A","B"),10,T),
group2 = sample(c("A","B"),10,T),
var1 = sample(10),
var2 = sample(10),
var3 = sample(10),
var4 = sample(10))
library(dplyr, warn.conflicts = F)
通过创建功能列表并应用于列的子集来使用summarise_at
功能
summarise_fun <- funs(
mean = Rmisc::CI(., ci=0.95)['mean'],
lower = Rmisc::CI(., ci=0.95)['lower'],
upper = Rmisc::CI(., ci=0.95)['upper'])
summary_CI <- data %>%
group_by(group1, group2) %>%
summarize_at(vars(num_range("var", 1:3)), summarise_fun)
summary_CI
#> # A tibble: 4 x 11
#> # Groups: group1 [?]
#> group1 group2 var1_mean var2_mean var3_mean var1_lower var2_lower
#> <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A A 6.0 5.500000 7.500000 -44.824819 -0.8531024
#> 2 A B 6.5 6.500000 6.500000 -25.265512 -37.9717166
#> 3 B A 4.0 5.666667 5.666667 -4.956686 -2.3187188
#> 4 B B 6.0 4.666667 3.333333 3.515862 -5.3728564
#> # ... with 4 more variables: var3_lower <dbl>, var1_upper <dbl>,
#> # var2_upper <dbl>, var3_upper <dbl>
您可以在对var4进行总结之后加入
inner_join(summary_CI,
data %>%
group_by(group1, group2) %>%
summarize_at("var4", sum),
by = c("group1", "group2"))
#> # A tibble: 4 x 12
#> # Groups: group1 [?]
#> group1 group2 var1_mean var2_mean var3_mean var1_lower var2_lower
#> <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A A 6.0 5.500000 7.500000 -44.824819 -0.8531024
#> 2 A B 6.5 6.500000 6.500000 -25.265512 -37.9717166
#> 3 B A 4.0 5.666667 5.666667 -4.956686 -2.3187188
#> 4 B B 6.0 4.666667 3.333333 3.515862 -5.3728564
#> # ... with 5 more variables: var3_lower <dbl>, var1_upper <dbl>,
#> # var2_upper <dbl>, var3_upper <dbl>, var4 <int>
purrr
基本上,它在一个块中完成了在解决方案1中完成的操作。它使用summarise_at
和.vars
列表中的参数来应用.funs
。结果是一个data.frame列表,您可以将其与join
library(purrr)
lst(.vars = lst(vars(num_range("var", 1:3)), "var4"),
.fun = lst(summarise_fun, sum)) %>%
pmap(~ data %>%
group_by(group1, group2) %>%
summarise_at(.x, .y)) %>%
reduce(inner_join, by = c("group1", "group2"))
#> # A tibble: 4 x 12
#> # Groups: group1 [?]
#> group1 group2 var1_mean var2_mean var3_mean var1_lower var2_lower
#> <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A A 6.0 5.500000 7.500000 -44.824819 -0.8531024
#> 2 A B 6.5 6.500000 6.500000 -25.265512 -37.9717166
#> 3 B A 4.0 5.666667 5.666667 -4.956686 -2.3187188
#> 4 B B 6.0 4.666667 3.333333 3.515862 -5.3728564
#> # ... with 5 more variables: var3_lower <dbl>, var1_upper <dbl>,
#> # var2_upper <dbl>, var3_upper <dbl>, var4 <int>