我有一个数据集df1
。
我要用查询表df2
中所示的一些数字替换每次出现的“一个+1”,“两个;一个”等。
所需的输出:
任何想法该怎么做?这是我原始问题的跟进 How to replace string values in a column based on a lookup table
我尝试了以下操作,但不起作用。预先感谢!
df1$New <- gsubfn::gsubfn("[A-z]+,;", as.list(setNames(df2$Node,df2$Label)), df1$Node)
数据:
df1 <- data.frame(ID = 1:5, Node = c("One + one > Two ; one > Three ; two", "One + two > Two ; two > Three ; one", "One + one > Two ; two > Three ; one", "One + two > Two ; one > Three ; two", "One + one > Two ; two > Three ; two"), stringsAsFactors = FALSE)
df2 <- data.frame(Label = c("One + one", "One + two", "Two ; one", "Two ; two", "Three ; one", "Three ; two"), Node = c("1.1", "1.2", "2.1", "2.2", "3.1", "3.2"), stringsAsFactors = FALSE)
更新数据:
df1 <- data.frame(ID = 1:5, Node = c("AO Ales + Bitter > Brown and Stout > Premium && Super Premium",
"Lager > Dry, Premium Strength, Style, Traditional > Mainstream & Value",
"AO Ales + Bitter > Dry, Premium Strength, Style, Traditional > Mainstream & Value",
"Lager > Brown and Stout > Dry, Premium Strength, Style, Traditional",
"AO Ales + Bitter > Dry, Premium Strength, Style, Traditional > Premium && Super Premium"), stringsAsFactors = FALSE)
df2 <- data.frame(Label = c("AO Ales + Bitter",
+ "Lager",
+ "Brown and Stout",
+ "Dry, Premium Strength, Style, Traditional",
+ "Mainstream & Value",
+ "Premium && Super Premium"
+ ), Node = c("1.1", "1.2", "2.1", "2.2", "3.1", "3.2"), stringsAsFactors = FALSE)
答案 0 :(得分:1)
我们可以更轻松地做到这一点
library(gsubfn)
library(english)
gsubfn("([a-z]+)", as.list(setNames(1:9, as.character(as.english(1:9)))),
tolower(gsub("\\s*[+;]\\s*", ".", df1$Node)))
#[1] "1.1 > 2.1 > 3.2" "1.2 > 2.2 > 3.1" "1.1 > 2.2 > 3.1"
#[4] "1.2 > 2.1 > 3.2" "1.1 > 2.2 > 3.2"
基于新示例,我们可以在base R
nm1 <- setNames(df2$Node, df2$Label)
sapply(strsplit(df1$Node, " > "), function(x) paste(nm1[x], collapse = " > "))
#[1] "1.1 > 2.1 > 3.2" "1.2 > 2.2 > 3.1" "1.1 > 2.2 > 3.1"
#[4] "1.2 > 2.1 > 2.2" "1.1 > 2.2 > 3.2"