在我的数据框中,我试图计算某些文本'000','xxx'而不是(000 | xxx)
我的数据框是这样的:
Name per1 per2 per3
a1 000 xxx 230
a1 xxx 000 NA
a2 000 340 xxx
a3 000 xxx NA
所需结果计数:
000 xxx Others
a1 2 2 1
a2 1 1 1
a3 1 1 0
使用dplyr
:我尝试了但出了错,请帮助实现该目标
df %>% groupby(Name) %>% filter(grepl('000')) %>% summarize(000 = n())
答案 0 :(得分:2)
一种选择是将数据转换为long format
,然后使用reshape2::dcast
获取计数为:
library(tidyverse)
library(reshape2)
df %>% gather(key, value, -Name) %>%
mutate(value = ifelse(is.na(value), "Others", value)) %>%
dcast(Name~value, fun.aggregate = length)
# Name 000 230 340 Others xxx
# 1 a1 2 1 0 1 2
# 2 a2 1 0 1 0 1
# 3 a3 1 0 0 1 1
或::如果OP希望对000
,xxx
和Others
个类别进行计数,则:
library(tidyverse)
library(reshape2)
df %>% gather(key, value, -Name) %>%
mutate(value =
ifelse(is.na(value) | !(value %in% c("000", "xxx")), "Others", value)) %>%
dcast(Name~value, fun.aggregate = length)
# Name 000 Others xxx
# 1 a1 2 2 2
# 2 a2 1 1 1
# 3 a3 1 1 1
数据:
df<-read.table(text="
Name per1 per2 per3
a1 000 xxx 230
a1 xxx 000 NA
a2 000 340 xxx
a3 000 xxx NA",
header=TRUE, stringsAsFactor = FALSE)
答案 1 :(得分:1)
这里有一些tidyverse
可能性,所有变化都基于相同的想法:
library(tidyverse)
df %>%
nest(-Name) %>%
rowwise %>%
summarize(`000` = sum(data =='000',na.rm=T),
xxx = sum(data =='xxx',na.rm=T),
Others = sum(!is.na(data))-`000` - xxx)
df %>%
nest(-Name) %>%
group_by(Name) %>%
summarize(`000` = sum(data[[1]]=='000',na.rm=T),
xxx = sum(data[[1]]=='xxx',na.rm=T),
Others = sum(!is.na(data[[1]]))-`000` - xxx)
df %>%
group_by(Name) %>%
do(tibble(`000` = sum(.[-1]=='000',na.rm=T),
xxx = sum(.[-1]=='xxx',na.rm=T),
Others = sum(!is.na(.[-1]))-`000` - xxx)) %>%
ungroup
# # A tibble: 3 x 4
# Name `000` xxx Others
# <chr> <int> <int> <int>
# 1 a1 2 2 1
# 2 a2 1 1 1
# 3 a3 1 1 0
请注意rowwise
和按行分组的工作方式稍有不同。
这也是R的基础翻译:
do.call(
rbind,
by(df,df$Name,function(x) data.frame(
Name = x$Name[1],
`000` = sum(x[-1]=='000',na.rm=T),
xxx = sum(x[-1]=='xxx',na.rm=T),
Others = sum(x[-1]!='000' & x[-1]!='xxx',na.rm=T))))
# Name X000 xxx Others
# a1 a1 2 2 1
# a2 a2 1 1 1
# a3 a3 1 1 0
答案 2 :(得分:1)
如果我理解正确,并且任务是用xxx
计算所有000
,!000&!xxx
和Name
,我们也可以使用base::table()
来获得所需的输出:
df <- data.frame(Name = c("a1", "a1", "a2", "a3"),
per1 = c("000", "xxx", "000", "000"),
per2 = c("xxx", "000", 340, "xxx"),
per3 = c(230, NA, "xxx", NA),
stringsAsFactors = F
)
Vals <- unlist(df[,-1]) # convert to the vector
Vals[!(Vals %in% c("000", "xxx")) & !is.na(Vals)] <- "Others" # !(xxx|000) <- Others
#
as.data.frame.matrix( # group by Name, count
table(rep(df$Name, ncol(df) - 1), Vals, useNA = "no") # don't count NAs
) # convert to data.frame
# 000 Others xxx
#a1 2 1 2
#a2 1 1 1
#a3 1 0 1