如何使用for循环在数据框中找到特定的字符串?

时间:2018-12-12 13:34:26

标签: r

我正在使用for循环在另一个数据帧(df1 $ x1)中查找所有特定的字符串(df2 $ x2),我的目的是在df1 $ test中创建新列并写入df $ x2值。

例如:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))

for(i in 1:nrow(df2)){

  f <- df2[i,1]

  df1$test <- ifelse(grepl(f, df1$x1),f,"not found")

}

循环结束后该怎么办?我知道问题是y每次都会刷新。我尝试了“ if”语句来创建新的数据框并保存输出,但是没有用。它只写一个特定的字符串。

谢谢。

预期输出:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
             output = c("not found","TE-D31L-2","not found","TE-D31L-2","EC20"))

enter image description here

2 个答案:

答案 0 :(得分:1)

是否要为每个字符串添加一个新列?如果需要的话,您的代码应该是:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))

for(i in 1:nrow(df2)){

  f <- df2[i,1]
  df1$test <- ""
  df1$test<-ifelse(grepl(f, df1$x1),T,F)
  colnames(df1) <- c(colnames(df1[1:length(df1[1,])-1]),f)

}

它将使用临时名称创建一个新列,然后使用评估的字符串对其进行重命名。另外,我将F更改为“未找到”,但是您可以使用任何想要的内容。

[编辑:] 如果要获得预期的输出,可以使用以下代码:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
df1$output <- "not found"

for(i in 1:nrow(df2)){
  f <- df2[i,1]
  df1$output[grepl(f, df1$x1)]<-f

}

与您所做的非常相似,但是需要索引必须写的行。 这仅在数据只能有一个匹配项时起作用,如果行可以有多个匹配项,则要复杂一些。但是我认为那不是你的问题。

答案 1 :(得分:0)

您只需要在空间上分割df1$x1字符串并在match上合并(或df2$x2,因为您只对一个变量感兴趣),即

v1 <- sub('\\s+.*', '', df1$x1)
v1[match(v1, df2$x2)]
#[1] NA          "TE-D31L-2" NA          "TE-D31L-2" "EC20"