如何跨列提取部分单元格值?

时间:2019-08-08 20:20:04

标签: r data.table substring

我有一个像这样的数据框:

df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
                               "37723803","13925456","37713534","1085246853"),
                    q006_2 = c("1098160170","89009521","9726314","28076230","63451251",
                               "1090421499","37124019"),
                    q006_3 = c("52118967","41915062","1088245358","79277706","91478662",
                               "80048634")), 
               class=data.frame, row.names = c(NA, -8L)))

我知道如何使用data.table中的substr提取每一列的最后五个数字,但是我想在所有列中进行提取。

n_last <- 5  

df1[, `q006_1`:= substr(q006_1, nchar(q006_1) - n_last + 1, nchar(q006_1))]

如何为所有列执行此操作?

1 个答案:

答案 0 :(得分:2)

data.table中,可以按照以下步骤进行操作:(您的样本数据不完整,因为第一列有8个,第二列有7个,第三列有6个条目。)

library(data.table)

#or `cols <- names(df1)` if you want to apply it on all columns and this is not just an example
cols <- c("q006_1", "q006_2", "q006_3") 

setDT(df1)[ , (cols):= lapply(.SD, function(x){
                                   sub('.*(?=.{5}$)', '', x, perl=T)}),
             .SDcols = cols][]

#     q006_1 q006_2 q006_3
# 1:  86880  60170  18967
# 2:  93806  09521  15062
# 3:  92464  26314  45358
# 4:  93586  76230  77706
# 5:  23803  51251  78662
# 6:  25456  21499  48634
# 7:  13534  24019  76230
# 8:  46853  76230  76230

数据:

df1<-structure(list(q006_1 = c("1098686880", "18493806","9892464","96193586",
                               "37723803","13925456","37713534","1085246853"),
                    q006_2 = c("1098160170","89009521","9726314","28076230",
                               "63451251","1090421499","37124019","28076230"),
                    q006_3 = c("52118967","41915062","1088245358","79277706",
                               "91478662","80048634","28076230","28076230")),
                class = c("data.frame"), row.names = c(NA, -8L))