我的数据框的列数未知(它可能经常更改),我需要计算给定ID和每一列的年份的观察次数,并为我的每一列创建一个costum“ n”列数据框告诉我对该特定列进行了多少观察。
我尝试过:
library(dplyr)
count <- tally(group_by(final_database,ID,Year))
但这将计算ID + Year的唯一组合。尽管我需要知道这些年来针对每个特征观察到我的ID的次数。示例:
ID Year CHAR1 n_CHAR1
A 2016 0 3
A 2017 5 3
A 2018 2 3
A 2019 3
B 2016 1 2
B 2017 2
B 2018 2
B 2019 1 2
对于所有特征,依此类推。我会将“ n_CHAR”列插入原始数据框。
它不需要整洁。 谢谢!
答案 0 :(得分:3)
尝试:
transform(final_database, n_CHAR1 = ave(CHAR1, ID, FUN = function(x) sum(x != "")))
如果空白行实际上是NA
,则只需将sum(x != "")
替换为sum(!is.na(x))
。
编辑:
如果多个n
列需要多个NCHAR
列,则可以执行以下操作:
library(dplyr)
final_database %>%
group_by(ID) %>%
mutate_at(vars(starts_with("CHAR")),
list(n = ~ sum(. != "")))
此示例假定所有相关的NCHAR
列都以字符串NCHAR
开头(例如NCHAR1
,NCHAR2
,NCHAR3
等)。 / p>
如果您要引用的列是倒数第三,那么您可以执行以下操作:
library(dplyr)
finalDatabase <- final_database %>%
group_by(ID) %>%
mutate_at(vars(3:ncol(.)), # If you don't have many other vars except NCHAR, you can also do vars(-ID, -Year) as suggested by @camille
list(n = ~ sum(. != ""))) %>%
select(ID, Year, ends_with("_n"))
答案 1 :(得分:0)
我们也可以使用data.table
来做到这一点:
library(data.table)
setDT(df)[, n_CHAR1 := sum(CHAR1 != ""), by = "ID"]
输出:
ID Year CHAR1 n_CHAR1
1: A 2016 0 3
2: A 2017 5 3
3: A 2018 2 3
4: A 2019 3
5: B 2016 1 2
6: B 2017 2
7: B 2018 2
8: B 2019 1 2
数据:
df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B"),
Year = c(2016L, 2017L, 2018L, 2019L, 2016L, 2017L, 2018L,
2019L), CHAR1 = c("0", "5", "2", "", "1", "", "", "1")), row.names = c(NA,
-8L), class = "data.frame")