我有一个数据框,其中包含列的第一类。下面提到的类别列中的数据:
Application Platforms|Real Time|Social Network Media
Apps|Games|Mobile
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment|Games|Software
Networking|Real Estate|Web Hosting
类别列表是由管道(竖线|)分隔的多个子扇区的列表。我想提取主要扇区,它是垂直条(“|”)之前的第一个字符串。
这意味着我希望输出应该是,
Application Platforms
Apps
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment
Networking
请帮助我如何通过使用任何功能来实现这一点,我尝试过使用stringr包函数。
答案 0 :(得分:2)
我们可以在这里使用sub
:
df$category <- sub("^([^|]+).*", "\\1", df$category)
以下是另一种不使用捕获组的变体:
df$category <- sub("\\|.*", "", df$category)
答案 1 :(得分:2)
使用strsplit
:
category1 <- strsplit(df$category, "|", fixed = TRUE)
df$category <- sapply(category1, `[[`, 1) # or, purrr::map_chr(category1, 1)
我认为这个解决方案比使用sub
更清楚你的意图。然后,它需要额外的一行。
答案 2 :(得分:1)
或使用stringr
...
str_match("Application Platforms|Real Time|Social Network Media",
"^(.+?)[|$]")[,2] #match start of string up to first | or end or string
[1] "Application Platforms"
...或
str_replace("Application Platforms|Real Time|Social Network Media",
"\\|.+$","") #replace | and any subsequent characters with ""
[1] "Application Platforms"
...或
str_extract("Application Platforms|Real Time|Social Network Media",
"[^|]+") #extract first sequence of characters that are not a |
[1] "Application Platforms"
...或
str_split_fixed("Application Platforms|Real Time|Social Network Media",
"\\|",2)[,1] #split at first | and take the first section
[1] "Application Platforms"