根据变量/ NA的存在来总结数据帧:在R中

时间:2016-03-03 13:27:38

标签: r dataframe

对于示例数据框:

  df <- structure(
  list(
    country = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L,
        3L, 3L, 3L, 3L),
      .Label = c("Austria", "France", "UK"),
      class = "factor"
    ),
    id = 1:10,
    region.0 = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L,
        3L, 3L, 3L, 3L),
      .Label = c("AT", "FR", "UK"),
      class = "factor"
    ),
    region.1 = structure(
      c(1L, 1L, 2L, 3L, 3L, 3L, 4L, 4L, 6L,
        5L),
      .Label = c("AT1", "AT2", "FR1", "UK1", "UK4", "UK6"),
      class = "factor"
    ),
    region.2 = structure(
      c(1L, 1L, 2L, 3L, 4L, 5L, NA, NA, NA,
        NA),
      .Label = c("AT11", "AT21", "FR12", "FR14", "FR19"),
      class = "factor"
    ),
    region.3 = structure(
      c(NA, NA, NA, 1L, 2L, 3L, NA, NA, NA,
        NA),
      .Label = c("FR121", "FR142", "FR196"),
      class = "factor"
    )
  ),
  .Names = c("country",
             "id", "region.0", "region.1", "region.2", "region.3"),
  class = "data.frame",
  row.names = c(NA, -10L)
)

我想制作一份汇总表,详细说明我的df数据框中可用的国家/地区级数据。

区域数据可在区域1,区域2或区域3中获得。数据可用,或列为“NA”。无论该国家/地区的“ID”如何,区域数据可用的级别都是相同的。

我想要在数据框中的最终结果如下:

  country region.1 region.2 region.3
1 Austria      Yes      Yes       No
2  France      Yes      Yes      Yes
3      UK      Yes       No       No

有人可以建议特定的套餐或代码来帮助我吗?

1 个答案:

答案 0 :(得分:5)

我们可以使用data.table。将“data.frame”转换为“data.table”(setDT(df)),按“国家/地区”分组,在.SDcols中指定要比较的列,我们使用{{1}循环列},然后检查lapply if列中的值是否为“NA”,然后输出“否”作为输出或all获取“是”

else

或者使用library(data.table) setDT(df)[, lapply(.SD, function(x) if(all(is.na(x))) "No" else "Yes") , country, .SDcols=region.1:region.3] # country region.1 region.2 region.3 #1: Austria Yes Yes No #2: France Yes Yes Yes #3: UK Yes No No ,我们可以在按“国家/地区”(dplyr)分组后实施相同的逻辑。

group_by