选择列中矢量的第一个实例

时间:2014-02-14 08:39:18

标签: string r vector dataframe

我有匹配函数的输出。在某些情况下,函数不能从匹配中选择两个或多个名称中的一个,因此将它们/ all存储在列中的向量中。

我想要完成的是选择列中矢量的第一个,第二个,第三个..实例来继续。

这是一个复制数据框:

string <- c("c(\"Kaskazini 'A'\", \"Kaskazini 'B'\")","c(\"Kabale\", \"Kabare\")","c(\"Kisoko\", \"Kisoro Tc\")",
            "c(\"Luwero East\", \"Luwero West\")", "c(\"Marindi\", \"Malindi\")",c("c(\"Mukongoro\", \"Mukono Tc\", \"Muko\")")
)

testdf <- data.frame(string
           )

2 个答案:

答案 0 :(得分:1)

以下是使用正则表达式的简单方法:

# extract instances (in a list)
strings <- regmatches(testdf$string, 
                      gregexpr("(?<=\")[^\"]+?(?=\"[,)])", 
                               testdf$string, perl = TRUE))

[[1]]
[1] "Kaskazini 'A'" "Kaskazini 'B'"
[[2]]
[1] "Kabale" "Kabare"
[[3]]
[1] "Kisoko"    "Kisoro Tc"
[[4]]
[1] "Luwero East" "Luwero West"
[[5]]
[1] "Marindi" "Malindi"
[[6]]
[1] "Mukongoro" "Mukono Tc" "Muko"     


# add columns to `testdf`
testdf$first <- sapply(strings, "[", 1)
testdf$second <- sapply(strings, "[", 2)
testdf$third <- sapply(strings, "[", 3)

                               string         first        second third
1 c("Kaskazini 'A'", "Kaskazini 'B'") Kaskazini 'A' Kaskazini 'B'  <NA>
2               c("Kabale", "Kabare")        Kabale        Kabare  <NA>
3            c("Kisoko", "Kisoro Tc")        Kisoko     Kisoro Tc  <NA>
4     c("Luwero East", "Luwero West")   Luwero East   Luwero West  <NA>
5             c("Marindi", "Malindi")       Marindi       Malindi  <NA>
6 c("Mukongoro", "Mukono Tc", "Muko")     Mukongoro     Mukono Tc  Muko

如果您不想手动创建所有列或不知道最大实例数,可以使用以下方法:

res <- sapply(seq(max(sapply(strings, length))), function(x) 
  sapply(strings, "[", x))

cbind(testdf, res)

                               string             1             2    3
1 c("Kaskazini 'A'", "Kaskazini 'B'") Kaskazini 'A' Kaskazini 'B' <NA>
2               c("Kabale", "Kabare")        Kabale        Kabare <NA>
3            c("Kisoko", "Kisoro Tc")        Kisoko     Kisoro Tc <NA>
4     c("Luwero East", "Luwero West")   Luwero East   Luwero West <NA>
5             c("Marindi", "Malindi")       Marindi       Malindi <NA>
6 c("Mukongoro", "Mukono Tc", "Muko")     Mukongoro     Mukono Tc Muko

答案 1 :(得分:0)

我想这就是你想要的。

string <- c("c(\"Kaskazini 'A'\", \"Kaskazini 'B'\")","c(\"Kabale\", \"Kabare\")","c(\"Kisoko\", \"Kisoro Tc\")",
            "c(\"Luwero East\", \"Luwero West\")", "c(\"Marindi\", \"Malindi\")",c("c(\"Mukongoro\", \"Mukono Tc\", \"Muko\")")
)

testdf <- data.frame(string)
#convert all quotes into pipe symbol for use as a delimiter
testdf$string <- gsub('"',"|",testdf$string)
#split the string using pipe
testdf$strsplit <- strsplit(testdf$string, "|",fixed=TRUE)
#extract first name using sapply
testdf$first <- sapply(testdf$strsplit, function(x) x[[2]])
#extract second name using sapply
testdf$second <- sapply(testdf$strsplit, function(x) x[[4]])