我有一个像这样的数据框:
df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
"37723803","13925456","37713534","1085246853"),
q006_2 = c("1098160170","89009521","9726314","28076230","63451251",
"1090421499","37124019"),
q006_3 = c("52118967","41915062","1088245358","79277706","91478662",
"80048634")),
class=data.frame, row.names = c(NA, -8L)))
我知道如何使用data.table中的substr
提取每一列的最后五个数字,但是我想在所有列中进行提取。
n_last <- 5
df1[, `q006_1`:= substr(q006_1, nchar(q006_1) - n_last + 1, nchar(q006_1))]
如何为所有列执行此操作?
答案 0 :(得分:2)
在data.table
中,可以按照以下步骤进行操作:(您的样本数据不完整,因为第一列有8个,第二列有7个,第三列有6个条目。)
library(data.table)
#or `cols <- names(df1)` if you want to apply it on all columns and this is not just an example
cols <- c("q006_1", "q006_2", "q006_3")
setDT(df1)[ , (cols):= lapply(.SD, function(x){
sub('.*(?=.{5}$)', '', x, perl=T)}),
.SDcols = cols][]
# q006_1 q006_2 q006_3
# 1: 86880 60170 18967
# 2: 93806 09521 15062
# 3: 92464 26314 45358
# 4: 93586 76230 77706
# 5: 23803 51251 78662
# 6: 25456 21499 48634
# 7: 13534 24019 76230
# 8: 46853 76230 76230
数据:
df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
"37723803","13925456","37713534","1085246853"),
q006_2 = c("1098160170","89009521","9726314","28076230",
"63451251","1090421499","37124019","28076230"),
q006_3 = c("52118967","41915062","1088245358","79277706",
"91478662","80048634","28076230","28076230")),
class = c("data.frame"), row.names = c(NA, -8L))