根据数据框列表中的字符串模式过滤行

时间:2019-10-23 10:57:27

标签: r dataframe

R的新手,现在我有了一个具有相同列名的不同数据帧的列表,并且每个数据帧中都有一列我想用来过滤某些行。这是一个内部数据帧的示例。列表:

dput(df1)
df1 <- structure(list(v1 = c("A", "B", "B"),
                      t2 = c("James","[Jane] ='1' and [jane]='2'", "[john] ='1' and [john]='2' or [sly]='34'"),
                      t3 = c("James","erick", "ger'")),
                 class = "data.frame", row.names = c(NA,  -3L))

dput(df2)
df2 <- structure(list(v1 = c("B", "C", "B"),
                      t2 = c("James","[Jane]='44' or [ellen]='1' and [ellen] ='2'", "Egg"),
                      t3 = c("James","Jane", "Egg")),
                 class = "data.frame", row.names = c(NA,  -3L))
dput(df3)
df3 <- structure(list(v1 = c("d", "e", "A"),
                      t2 = c("[James] ='2' and [james]='3' or '[rady] ='44'","([rock] = '51' and  [rock] = '53') and ([roger] = '0')", "Egg"),
                      t3 = c("James","Jane", "Egg")),
                 class = "data.frame", row.names = c(NA,  -3L))

现在查看每个数据框的 t2 列,我们有一些行具有字符串组织模式,如 df1 中,我们具有类似的模式[Jane] ='1'和[jane] ='2' [john] ='1'和[john] ='2'或[sly] ='34' ,现在,我确实想编写一个脚本,该脚本可以遍历列表中的每个数据框,然后找到列 t2 ,并且只能过滤具有这种模式的行,但是由于这种列具有更多不同的模式,希望它只查找名称重复两次且带有 的行,例如在 df1 中说,我们 [Jane] ='1再次重复',并在它们之间加上,因为我们有 [Jane] ='1'和[jane] ='2 '。

我的兴趣是在列 t2 中找到具有重复名称的行,但像现在在 df1 中一样,它们之间也必须有 [Jane] ='1'和[jane] ='2' [john] ='1'和[john] ='2'或[sly] ='34的行',因为名称 Jane john 已经重复了两次,并且它们之间有

您还注意到,在 df2 中,我们可能有两个重复的名称,但是它们之间有一个 or ** [merc] ='44'或[merc] = '2'*和[lean] ='7'*,我不需要该行,我只想要重复的名称,而在它们之间使用

期望的输出

dput(df)
df1 <- structure(list(v1 = c( "B", "B"),
                      t2 = c("[Jane] ='1' and [jane]='2'", "[john] ='1' and [john]='2' or [sly]='34'"),
                      t3 = c("erick", "ger'")),
                 class = "data.frame", row.names = c(NA,  -3L))


dput(df2)
df2 <- structure(list(v1 = c( "B"),
                      t2 = c("[Jane]='44' or [ellen]='1' and [ellen] ='2'"),
                      t3 = c("Jane")),
                 class = "data.frame", row.names = c(NA,  -3L))


dput(df3)
df3 <- structure(list(v1 = c("d", "e"),
                      t2 = c("[James] ='2' and [james]='3' or '[rady] ='44'","([rock] = '51' and  [rock] = '53') and ([roger] = '0')"),
                      t3 = c("James","Jane")),
                 class = "data.frame", row.names = c(NA,  -3L))

如何执行此操作

0 个答案:

没有答案