如何检查某些列是否存在,如果不存在,如何创建它们并用零填充它们?

时间:2018-07-11 13:40:20

标签: r

我有一个数据框,可以操纵它,其最终形式有时会根据是否存在某些值而改变形状。

一旦处理完数据框并采用最终格式,我希望在写入.csv之前将列重新排序为特定顺序。

但是,由于某些列并不总是存在,所以我想知道是否可以检查哪些列存在以及哪些列存在,因此我希望它们遵循特定的格式,而某些则不行。 t创建并填充零。

我有一个解决方案,我认为它很笨拙,并且可能会大大改善:在本示例中,我正在检查列taken_offline是否在我的数据集中存在。如果是这样,我希望以某种方式对列进行重新排序,包括此列;如果不包含,我希望创建taken_offline并填充零,同时仍然以相同的方式重新排序。

理想情况下,我希望能够说“列的显示顺序如下。如果该列不存在,则要创建该列并用零填充”。

我知道,一个好方法可能是从我的数据框(users)中获取列名列表,然后根据所需的列顺序(在下面列出)检查列名。但是,我不确定如何实现这个想法。

我该怎么办?

输出列应按以下顺序:

"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"

我的代码(检查taken_offline是否存在)

if("taken_offline" %in% colnames(users_final)){
  users_final <- users_final[, c(
    "date",
    "storeName",
    "firstName",
    "lastName",
    "conversation-request",
    "conversation-accepted",
    "acceptance_rate",
    "conversation-missed",
    "taken_offline",
    "conversation-already-accepted",
    "total_missed",
    "conversation-declined"
  )]
  print("Taken offline occurrences.")
} else {
  users_final$taken_offline <- 0
  users_final <- users_final[, c(
    "date",
    "storeName",
    "firstName",
    "lastName",
    "conversation-request",
    "conversation-accepted",
    "acceptance_rate",
    "conversation-missed",
    "taken_offline",
    "conversation-already-accepted",
    "total_missed",
    "conversation-declined"
  )]
  print("No taken offline occurrences.")
}

2 个答案:

答案 0 :(得分:2)

其他答案的简单版本。多亏了R的向量回收功能,这可以快速完成两行。

为列向量命名,例如all_cols。然后,调用您的数据dd

# add missing columns and set them equal to 0
dd[setdiff(all_cols, names(dd)] = 0
# put columns in desired order
dd = dd[all_cols]

工作示例:

all_cols = c("date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined")

dd = data.frame("date" = "yesterday",
"storeName" = "Kwik-E-Mart",
"firstName" = "Apu")

dd[setdiff(all_cols, names(dd))] = 0
dd = dd[all_cols]
dd
#        date   storeName firstName lastName conversation-request conversation-accepted acceptance_rate
# 1 yesterday Kwik-E-Mart       Apu        0                    0                     0               0
#   conversation-missed taken_offline conversation-already-accepted total_missed conversation-declined
# 1                   0             0                             0            0                     0

答案 1 :(得分:0)

如果您有命名矢量,请说出varname作为数据s的名称和顺序,然后可以使用:

var_not_present <- varname[which(!(varname %in% names(s)))]
h <- data.frame(matrix(0, ncol = length(var_not_present), nrow = dim(s)[1]))
colnames(h) <- var_not_present
s_updated <- cbind(s,h)
s_updated <- s_updated[varname]