R regex在数据帧列中第二次出现之后删除任何内容

时间:2016-04-06 18:44:45

标签: regex r

我的数据存储在dataframe列中,如下所示:

/travel
/food and drink/restaurants
/food and drink
/sports/outdoors/climbing

/news
/family

每行都有一些“/”,但它们总是以“/”开头。有些行也是空白的。我只需将此数据转换为仅包含第一个“/”之后但第二个“/”之前的文本。我也想把结果每个单词的第一个字母大写。所以我希望结果看起来像这样:

Travel
Food And Drink
Food And Drink
Sports

News
Family

4 个答案:

答案 0 :(得分:4)

gsub('(?<=\\b)([a-z])', '\\U\\1', x, perl =  TRUE)

# [1] "/Travel"                     "/Food And Drink/Restaurants" "/Food And Drink"            
# [4] "/Sports/Outdoors/Climbing"   "/News"                       "/Family"   

升级每个单词

/..

提取第一个gsub('^/([^/]+)|.', '\\1', x) # [1] "travel" "food and drink" "food and drink" "sports" "news" # [6] "family"

gsub('(?<=\\b)([a-z])', '\\U\\1', gsub('^/([^/]+)|.', '\\1', x), perl =  TRUE)

# [1] "Travel"         "Food And Drink" "Food And Drink" "Sports"         "News"          
# [6] "Family"  

结合两个

gsub

如果您不关心“和”是否为大写,则可以使用第二个tools::toTitleCasetools::toTitleCase(gsub('^/([^/]+)|.', '\\1', x)) # [1] "Travel" "Food and Drink" "Food and Drink" "Sports" "News" # [6] "Family"

{{1}}

答案 1 :(得分:1)

require(magrittr)

txt <- c("/travel", "/food and drink/restaurants", "/food and drink", "/sports/outdoors/climbing", "", "/news", "/family")

strsplit(txt, "/") %>% sapply( '[', 2 )  #per Frank's suggestion

##  [1] "travel"         "food and drink" "food and drink" "sports"        
##  [5] NA               "news"           "family"        

答案 2 :(得分:0)

快速方法如下:我假设您要收集的部分中只有字符\w和空格\s

char<- c("/travel","/food and drink/restaurants","/food and drink","/sports/outdoors/climbing","","/news","/family")

match <- regexpr("[\\w\\s]+",char,perl=TRUE)
regmatches(char,match)

## regmatches(char,match)
## [1] "travel"         "food and drink" "food and drink" "sports"        
## [5] "news"           "family"   

答案 3 :(得分:0)

您需要安装stringi软件包(无论如何您应该可以安装:)但以下应该可以解决这个问题

stringi::stri_trans_totitle( gsub("/([^/]+)", "\\1", data))

gsub只需在第一个/之后选择文本,直到第二个/或字符串结尾。 stringi::stri_trans_totitle然后为您进行大小写转换。

> s <-c("/food and drink/restaurants", "/beer and wine", "", "/news")
> stringi::stri_trans_totitle( gsub("/([^/]+)", "\\1", s))
[1] "Food And Drinkrestaurants" "Beer And Wine"            
[3] ""                          "News"