Question

这是我的第一个stackoverflow问题。

我正在尝试使用dplyr来处理和输出在我的数据集中按分类变量（inj_length_cat3）分组的数据摘要。实际上，我使用mutate（）动态生成这个变量（来自inj_length）。我还想输出相同的数据摘要而不用分组。我想出如何做到这一点的唯一方法是两次分析，一次，没有分组，然后组合输出。啊。

我确信有一个比这更优雅的解决方案，它让我烦恼。我想知道是否有人能够提供帮助。

谢谢！

library(dplyr)
df<-data.frame(year=sample(c(2005,2006),20,replace=T),inj_length=sample(1:10,20,replace=T),hiv_status=sample(0:1,20,replace=T))

tmp <- df  %>% 
  mutate(inj_length_cat3 = cut(inj_length, breaks=c(0,3,100), labels = c('<3 years','>3 years')))%>%
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years'))

tmp_all <- df  %>% 
  group_by(year)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  )

tmp_all$inj_length_cat3=as.factor('All')
tmp<-merge(tmp_all,tmp,all=T)

Answer 1

我不确定你认为这更优雅，但如果您首先创建一个包含所有数据两次的数据框，您就可以获得一个解决方案：一次以便您可以获得子组并且一次获得整体摘要：

df1 <- rbind(df,df)
df1$inj_length_cat3 <- cut(df$inj_length, breaks=c(0,3,100,Inf),
                           labels = c('<3 years','>3 years','All'))
df1$inj_length_cat3[-(1:nrow(df))] <- "All"

现在您只需要在没有mutate()的情况下运行第一次分析：

tmp <- df1  %>% 
  group_by(year,inj_length_cat3)%>%
  summarise(
    r=sum(hiv_status,na.rm=T),
    n=length(hiv_status),
    p=prop.test(r,n)$estimate,
    cilow=prop.test(r,n)$conf.int[1],
    cihigh=prop.test(r,n)$conf.int[2]
  ) %>% 
  filter(inj_length_cat3%in%c('<3 years','>3 years','All'))

如何使用dplyr分析在一次分析中分组和未分组的数据集

1 个答案: