Question

我有一个数据框，其中包含列的第一类。下面提到的类别列中的数据：

Application Platforms|Real Time|Social Network Media
Apps|Games|Mobile
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment|Games|Software
Networking|Real Estate|Web Hosting

类别列表是由管道（竖线|）分隔的多个子扇区的列表。我想提取主要扇区，它是垂直条（“|”）之前的第一个字符串。

这意味着我希望输出应该是，

Application Platforms
Apps
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment
Networking

请帮助我如何通过使用任何功能来实现这一点，我尝试过使用stringr包函数。

Answer 1

我们可以在这里使用sub：

df$category <- sub("^([^|]+).*", "\\1", df$category)

以下是另一种不使用捕获组的变体：

df$category <- sub("\\|.*", "", df$category)

Demo

Answer 2

使用strsplit：

category1 <- strsplit(df$category, "|", fixed = TRUE)
df$category <- sapply(category1, `[[`, 1)     # or, purrr::map_chr(category1, 1)

我认为这个解决方案比使用sub更清楚你的意图。然后，它需要额外的一行。

Answer 3

或使用stringr ...

str_match("Application Platforms|Real Time|Social Network Media",
       "^(.+?)[|$]")[,2] #match start of string up to first | or end or string

[1] "Application Platforms"

...或

str_replace("Application Platforms|Real Time|Social Network Media",
       "\\|.+$","") #replace | and any subsequent characters with ""

[1] "Application Platforms"

...或

str_extract("Application Platforms|Real Time|Social Network Media",
       "[^|]+") #extract first sequence of characters that are not a |

[1] "Application Platforms"

...或

str_split_fixed("Application Platforms|Real Time|Social Network Media",
       "\\|",2)[,1] #split at first | and take the first section

[1] "Application Platforms"

通过R从字符串中提取数据

3 个答案:

Demo