使用预定义的表在r中的Dataframe中搜索字符串并添加相应的列

时间:2018-08-15 23:48:37

标签: python r regex string dataframe

这是示例数据框

     # create a sample test scenarios
data <- rbind(c("Cisco Catalyst 3850 Series Ethernet Stackable Switch","ports"), c("Cisco 7200 Series Routers", "Ports"),
              c("Cisco Catalyst 3560-CX Series Switches", "Ports"), c("Data Center Switches", "Ports"), 
              c("SW_3560_IDF1d_1879.ccs.ccctov.org", "Ports")) # sample data

值表的字符串

3850 - 50
7200 - 2
3560 - 60
Data Center - 240
3560 - 60

最终输出样本:

finaldata <- rbind(c("Cisco Catalyst 3850 Series Ethernet Stackable Switch","ports", 50), c("Cisco 7200 Series Routers", "Ports", 2),
          c("Cisco Catalyst 3560-CX Series Switches", "Ports", 60), c("Data Center Switches", "Ports", 240), 
          c("SW_3560_IDF1d_1879.ccs.ccctgov.org", "Ports", 60)) # sample data

感谢您的帮助或指导!

1 个答案:

答案 0 :(得分:3)

我们可以使用dplyr中的一些正则表达式和recode

library(dplyr)
newColumn <- recode(sub(".*(3850|7200|3560|Data Center).*", "\\1", data[,1]), 
                    `3850` = 50, 
                    `7200` = 2,
                    `3560` = 60,
                    `Data Center` = 240)

cbind(data, newColumn)

                                                                    newColumn
[1,] "Cisco Catalyst 3850 Series Ethernet Stackable Switch" "ports" "50"     
[2,] "Cisco 7200 Series Routers"                            "Ports" "2"      
[3,] "Cisco Catalyst 3560-CX Series Switches"               "Ports" "60"     
[4,] "Data Center Switches"                                 "Ports" "240"    
[5,] "SW_3560_IDF1d_1879.ccs.ccctov.org"                    "Ports" "60"     

一种稍微不同的方法:

codes <- sub(".*(3850|7200|3560|Data Center).*", "\\1", data[,1])
newCodes <- list(`3850` = 50, `7200` = 2, `3560` = 60, `Data Center` = 240)
cbind(data, recode(codes, !!!newCodes))