这是我的数据框架。
library(data.table)
dt <- fread('
Name Video Webinar Meeting Conference Level NextStep
John 1 0 0 0 1 Webinar,Meeting,Conference
John 1 1 0 0 1 Meeting,Conference
John 1 1 1 0 2 Conference
Tom 0 0 1 0 1 Webinar,Conference,Video
Tom 0 0 1 1 2 Webinar,Video
Kyle 0 0 0 1 2 Webinar,Meeting,Video
')
我正在创建nextstep列
dt[, nextstep := paste0(names(.SD)[.SD==0], collapse = ','), 1:nrow(DT), .SDcols = 2:5][]
根据此处的解决方案Making a character string with column names with zero values
现在,我想根据“级别”字段更改元素在下一步列中的显示顺序。例如,如果它是1级,我希望会议在网络研讨会之前出现。会议。如果是2级,我希望视频总是最后出现。这是我的尝试。
dt<-dt[, NextStep := ifelse(Level1=="Level0",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(5,2,3,4)),
ifelse(EngagementLevel1=="Level2",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(3,4,5,2))))]
我只是想根据'Level'字段重新排序'nextstep'字段中的元素。真诚地感谢您的帮助!
答案 0 :(得分:4)
好吧,你可以把你喜欢的订单放在某个地方:
levelmap = data.table(Level = 1:2, ord = list(
c("Conference", "Webinar", "Meeting", "Video"),
c("Webinar", "Meeting", "Conference", "Video")
))
然后使用您之前的方法:
DT[, r := .I]
for (ii in seq(nrow(levelmap)))
DT[ Level == levelmap$Level[ii],
ns := paste0(names(.SD)[.SD==0], collapse = ',')
, by = r, .SDcols = levelmap$ord[[ii]] ][]
但实际上,我认为你根本不应该这样做(这个问题和前一个问题都没有)。处理数据是一种混乱的方式。
评论整洁的数据。为了澄清我的意思,我建议审核Hadley Wickham的paper on tidy data。这里整洁的数据可能如下所示:
myDT = melt(
DT[, !"NextStep", with=FALSE][, Seq := 1:.N, by=Name],
id.var = c("Name", "Seq", "Level"))
Name Seq Level variable value
1: John 1 1 Video 1
2: John 2 1 Video 1
3: John 3 2 Video 1
4: Tom 1 1 Video 0
5: Tom 2 2 Video 0
6: Kyle 1 2 Video 0
7: John 1 1 Webinar 0
8: John 2 1 Webinar 1
9: John 3 2 Webinar 1
10: Tom 1 1 Webinar 0
11: Tom 2 2 Webinar 0
12: Kyle 1 2 Webinar 0
13: John 1 1 Meeting 0
14: John 2 1 Meeting 0
15: John 3 2 Meeting 1
16: Tom 1 1 Meeting 1
17: Tom 2 2 Meeting 1
18: Kyle 1 2 Meeting 0
19: John 1 1 Conference 0
20: John 2 1 Conference 0
21: John 3 2 Conference 0
22: Tom 1 1 Conference 0
23: Tom 2 2 Conference 1
24: Kyle 1 2 Conference 1
Name Seq Level variable value
或者你甚至可能会丢弃所有零或者为零的行(因为它们相当冗余)。
这个想法是,这将是您用于进行任何分析或构建任何汇总表的主要数据。在你的情况下,目标是一个汇总表(据我所知),如
library(magrittr)
res = myDT[levelmap, on="Level"][, .( NextStep =
variable[value == 0] %>% factor(levels = ord[[1]]) %>% sort %>% toString
), keyby=.(Name, Seq, Level)]
Name Seq Level NextStep
1: John 1 1 Conference, Webinar, Meeting
2: John 2 1 Conference, Meeting
3: John 3 2 Conference
4: Kyle 1 2 Webinar, Meeting, Video
5: Tom 1 1 Conference, Webinar, Video
6: Tom 2 2 Webinar, Video
如果你真的想要0/1列,你也可以用dcast
(将数据从长变换为宽)包含它们:
cbind(
res,
dcast(myDT, Name + Seq ~ variable, value.var="value")[, !c("Name", "Seq"), with=FALSE])