我对R非常熟悉,但已达到我的数据需求要求我学习具有多个条件的迭代循环的程度。我见过使用各种形式的* apply()的例子,以及用于执行我需要的数据转换类型的colSums()和rowSums(),但我想提高这些任务的效率,也许嵌套或迭代循环。此外,现有建议没有考虑忽略/删除“NA”项目导致的数据丢失,我需要能够保留此信息。
我的一般数据格式如下:
group <- c("A", "B", "C", "A", "C" [...])
individual <- c("1", "2", "3", "4", "5" [...])
choice1 <- c("1", "0", "1", "1", "NA")
choice2 <- c("1", "NA", "1", "0", "NA")
[...]
choice10 <- c("1", "0", "1", "1", "NA")
我需要计算三个选项中每一个的计数; 1 ==是的; 0 ==无; NA ==选择退出选择 跨选择和跨组,然后将这些转换为百分比。我遇到过以前的方法遇到的最大困难,例如* apply()或跨行/列的求和是我的“NA”值(选择退出)被忽略,或者阻止我能够在组间充分考虑选择值的百分比。任何关于如何忽略OR保留循环结构中的“选择退出”/ NAs的具体建议或演示将不胜感激。
输出看起来有点像下面这样: yes.count_bychoice
no.count_bychoice
optout.count_bychoice
percentyes_bychoice_bygroup
percentno_bychoice_bygroup
percentout_bychoice_bygroup
答案 0 :(得分:1)
首先要做的事情。建立一个data.frame
。像这样:
d <- data.frame(group=group, individual=individual, choice1=choice1 ...)
我将以此为例:
d <- data.frame(group=sample(LETTERS[1:4],20,T), individual=1:20,
choice1=sample(c(0,1,NA),20,T), choice2=sample(c(0,1,NA),20,T))
我得到了
> head(d)
group individual choice1 choice2
1 D 1 1 NA
2 A 2 NA NA
3 C 3 1 1
4 A 4 1 NA
5 B 5 0 NA
6 B 6 1 1
我们将使用以下功能:
f <- function(x) c(yes=sum(x==1,na.rm=TRUE),no=sum(x==0,na.rm=TRUE),optout=sum(is.na(x)))
用于计数和
g <- function(x) f(x)/length(x)
为百分比。
对于全局计数,您可以使用:
counts <- apply(d[,-(1:2)], 2, FUN=f)
结果:
> counts
choice1 choice2
yes 11 8
no 4 2
optout 5 10
更改您获得百分比的功能:
> apply(d[,-(1:2)], 2, FUN=g)
choice1 choice2
yes 0.55 0.4
no 0.20 0.1
optout 0.25 0.5
要获得每组选择的计数,您可以使用:
counts_grp <- aggregate(d[,-(1:2)], by=list(group=d$group), FUN=f)
结果:
> counts_grp
group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1 A 1 0 3 2 0 2
2 B 3 2 0 3 1 1
3 C 4 0 2 3 0 3
4 D 3 2 0 0 1 4
对于百分比,您只需切换功能:
> aggregate(d[,-(1:2)], by=list(group=d$group), FUN=g)
group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1 A 0.2500000 0.0000000 0.7500000 0.5 0.0 0.5
2 B 0.6000000 0.4000000 0.0000000 0.6 0.2 0.2
3 C 0.6666667 0.0000000 0.3333333 0.5 0.0 0.5
4 D 0.6000000 0.4000000 0.0000000 0.0 0.2 0.8
答案 1 :(得分:0)
对于快速和肮脏的内容,您可能需要尝试这样查看aggregate
和prop.table
:
#Some data:
df <- data.frame( group = c("A", "B", "C", "A", "C" ) ,
individual = c("1", "2", "3", "4", "5" ),
choice1 = c("1", "0", "1", "1", "NA"),
choice2 = c("1", "NA", "1", "0", "NA") ,
choice3 = c("1", "NA", "NA", "0", "NA") )
#Convert to ordered factor to keep order of values as 0<1<NA in all cases, no matter the order they appear in a column
df <- as.data.frame( lapply( df , factor , order = TRUE ) )
#Then aggregate by group and choice, and work out proportion of each response
# Order of values is 0, then 1, then NA
# But if there are choices with missing values it won't be very good because it isn't labelled which values are which, but if all choices have at least one value in each category then first value will be proportion of 0, next will be proportion of 1's and finally proportion of NAs
aggregate( cbind( choice1 , choice2 , choice3 ) ~ group , data = df , prop.table )
#group choice1 choice2 choice3
#1 A 0.5, 0.5 0.6666667, 0.3333333 0.6666667, 0.3333333
#2 B 1 1 1
#3 C 0.4, 0.6 0.4, 0.6 0.5, 0.5