我有一个数据框,可以操纵它,其最终形式有时会根据是否存在某些值而改变形状。
一旦处理完数据框并采用最终格式,我希望在写入.csv之前将列重新排序为特定顺序。
但是,由于某些列并不总是存在,所以我想知道是否可以检查哪些列存在以及哪些列存在,因此我希望它们遵循特定的格式,而某些则不行。 t创建并填充零。
我有一个解决方案,我认为它很笨拙,并且可能会大大改善:在本示例中,我正在检查列taken_offline
是否在我的数据集中存在。如果是这样,我希望以某种方式对列进行重新排序,包括此列;如果不包含,我希望创建taken_offline
并填充零,同时仍然以相同的方式重新排序。
理想情况下,我希望能够说“列的显示顺序如下。如果该列不存在,则要创建该列并用零填充”。
我知道,一个好方法可能是从我的数据框(users
)中获取列名列表,然后根据所需的列顺序(在下面列出)检查列名。但是,我不确定如何实现这个想法。
我该怎么办?
输出列应按以下顺序:
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
我的代码(检查taken_offline
是否存在)
if("taken_offline" %in% colnames(users_final)){
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("Taken offline occurrences.")
} else {
users_final$taken_offline <- 0
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("No taken offline occurrences.")
}
答案 0 :(得分:2)
其他答案的简单版本。多亏了R的向量回收功能,这可以快速完成两行。
为列向量命名,例如all_cols
。然后,调用您的数据dd
# add missing columns and set them equal to 0
dd[setdiff(all_cols, names(dd)] = 0
# put columns in desired order
dd = dd[all_cols]
工作示例:
all_cols = c("date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined")
dd = data.frame("date" = "yesterday",
"storeName" = "Kwik-E-Mart",
"firstName" = "Apu")
dd[setdiff(all_cols, names(dd))] = 0
dd = dd[all_cols]
dd
# date storeName firstName lastName conversation-request conversation-accepted acceptance_rate
# 1 yesterday Kwik-E-Mart Apu 0 0 0 0
# conversation-missed taken_offline conversation-already-accepted total_missed conversation-declined
# 1 0 0 0 0 0
答案 1 :(得分:0)
如果您有命名矢量,请说出varname
作为数据s
的名称和顺序,然后可以使用:
var_not_present <- varname[which(!(varname %in% names(s)))]
h <- data.frame(matrix(0, ncol = length(var_not_present), nrow = dim(s)[1]))
colnames(h) <- var_not_present
s_updated <- cbind(s,h)
s_updated <- s_updated[varname]