目标是在行的最后一个值之后用“-”填充NA
# Like this
SOURCE X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 INFO
1: 04.xlsx David David - - - - - - A
2: 05.xlsx <NA> <NA> Tom Tom - - - - B
3: 06.xlsx <NA> <NA> <NA> <NA> Mary Mary - - C
4: 07.xlsx <NA> <NA> <NA> <NA> <NA> <NA> Peter Peter D
# Sample data
dt <- data.table(SOURCE = c("04.xlsx","05.xlsx","06.xlsx","07.xlsx"),
X__2 = c("David",NA,NA,NA),
X__3 = c("David",NA,NA,NA),
X__4 = c(NA,"Tom",NA,NA),
X__5 = c(NA,"Tom",NA,NA),
X__6 = c(NA,NA,"Mary",NA),
X__7 = c(NA,NA,"Mary",NA),
X__8 = c(NA,NA,NA,"Peter"),
X__9 = c(NA,NA,NA,"Peter"),
INFO = LETTERS[1:4])
我的尝试但没有成功
# Find odd columns
TAR_COL <- grep("X__",colnames(dt))[!c(TRUE,FALSE)]
dt[!is.na(TAR_COL),(TAR_COL):="-",.SDcols =TAR_COL]
该脚本在指定col时有效,但失去了动态选择列的功能
#
dt[!is.na(X__3),(grep("X__3",names(dt))+1):(grep("INFO",names(dt))-1) := "-"][]
SOURCE X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 INFO
1: 04.xlsx David David - - - - - - A
由于实际数据集是从不同的xlsx数据导入的,因此动态选择奇数是必须的
是否有任何方法可以在向量化列索引中应用!is.na()
并分配值?
答案 0 :(得分:4)
我们可以使用set
。使用set
循环遍历TAR_COL的列索引,指定列索引(j
)和行索引(i
-NA在该特定列中,并设置value
为'-'
for(j in TAR_COL) set(dt, i = which(is.na(dt[[j]])), j= j, value = "-")
dt
# SOURCE X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9 INFO
#1: 04.xlsx David David <NA> - <NA> - <NA> - A
#2: 05.xlsx <NA> - Tom Tom <NA> - <NA> - B
#3: 06.xlsx <NA> - <NA> - Mary Mary <NA> - C
#4: 07.xlsx <NA> - <NA> - <NA> - Peter Peter D
在这里,第3、5、7、9列的NA
元素被替换为-
注意:{'{1}}并没有帮助,因为'TAR_COL'只是列索引
基于OP的说明,即必须从最后出现的值到“ INFO”之前的最后一列一直水平替换NA,我们可以创建索引,其累积总和为注释中建议的@markus
!is.na(TAR_COL)