对于示例数据框:
df <- structure(
list(
country = structure(
c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L),
.Label = c("Austria", "France", "UK"),
class = "factor"
),
id = 1:10,
region.0 = structure(
c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 3L),
.Label = c("AT", "FR", "UK"),
class = "factor"
),
region.1 = structure(
c(1L, 1L, 2L, 3L, 3L, 3L, 4L, 4L, 6L,
5L),
.Label = c("AT1", "AT2", "FR1", "UK1", "UK4", "UK6"),
class = "factor"
),
region.2 = structure(
c(1L, 1L, 2L, 3L, 4L, 5L, NA, NA, NA,
NA),
.Label = c("AT11", "AT21", "FR12", "FR14", "FR19"),
class = "factor"
),
region.3 = structure(
c(NA, NA, NA, 1L, 2L, 3L, NA, NA, NA,
NA),
.Label = c("FR121", "FR142", "FR196"),
class = "factor"
)
),
.Names = c("country",
"id", "region.0", "region.1", "region.2", "region.3"),
class = "data.frame",
row.names = c(NA, -10L)
)
我想制作一份汇总表,详细说明我的df数据框中可用的国家/地区级数据。
区域数据可在区域1,区域2或区域3中获得。数据可用,或列为“NA”。无论该国家/地区的“ID”如何,区域数据可用的级别都是相同的。
我想要在数据框中的最终结果如下:
country region.1 region.2 region.3
1 Austria Yes Yes No
2 France Yes Yes Yes
3 UK Yes No No
有人可以建议特定的套餐或代码来帮助我吗?
答案 0 :(得分:5)
我们可以使用data.table
。将“data.frame”转换为“data.table”(setDT(df)
),按“国家/地区”分组,在.SDcols
中指定要比较的列,我们使用{{1}循环列},然后检查lapply
if
列中的值是否为“NA”,然后输出“否”作为输出或all
获取“是”
else
或者使用library(data.table)
setDT(df)[, lapply(.SD, function(x) if(all(is.na(x)))
"No" else "Yes") , country, .SDcols=region.1:region.3]
# country region.1 region.2 region.3
#1: Austria Yes Yes No
#2: France Yes Yes Yes
#3: UK Yes No No
,我们可以在按“国家/地区”(dplyr
)分组后实施相同的逻辑。
group_by