如何在R中应用substring函数?

时间:2017-03-25 22:35:08

标签: r apply lapply sapply

数据集包含有关超级英雄的信息。此代码从字符串'name'中删除恼人的括号部分:

# package loading
library(fivethirtyeight)

# data opening
data(package ="fivethirtyeight")
data(comic_characters)

for (i in 1:length(comic_characters$name)) 
{
  bracket[i]                <-  which(strsplit(comic_characters$name[i], "")[[1]] == "(")
  comic_characters$name[i]  <-  substr(comic_characters$name[i], start = 1, stop = bracket[i]-2)
}

如何使用apply函数(没有for循环)做同样的事情?这是我尝试的方式:

     bracket = sapply(sapply(strsplit(comic_characters$name, ''), function(x) 
    which(x == '(')), `[`, 1)  

    # here comes the problem:
        comic_characters$name <- lapply(x, function(x)
 substr(comic_characters$name, start=1, stop=bracket[i]-2)) 

我该怎么做?提前谢谢!

2 个答案:

答案 0 :(得分:1)

这不会达到同样的目的吗?

df <- data.frame(comic_characters)
df$name <- sub("\\(.*", "", df$name)

答案 1 :(得分:1)

您可以使用stringr包来实现目标。

# package loading
library(fivethirtyeight)
library(stringr)

# data opening
data(package ="fivethirtyeight")
data(comic_characters)

# remove text enclosed in brackets from character names
cleaned_character_names <- str_replace_all(
  string = comic_characters$name,
  pattern = "\\(.*\\)",
  replacement = ""
)

# trim whitespace from start and ending of the character names
cleaned_character_names <- str_trim(
  string = cleaned_character_names
)

某些字符名称有两个用括号括起来的部分,例如: &#34;盗贼(安娜玛丽)(地球-616)&#34;。上面的代码将删除&#34;(Anna Marie)&#34;和#34;(地球-616)&#34;从角色名称。