Question

我有以下数据框dat：

'^\(Name_\)\?[A-Z0-9]\{12,13\}[0-9]\{6\}\(BN\|CT\|PL\|XC\|XF\).zip$'

我想创建一个仅使用品牌名称的新专栏，因此无论如何都是专栏品牌的第一个词。我想要输出如下：

brand (column name)
Channel clothes
Gucci perfume
Channel shoes
LV purses
LV scarves

我尝试使用带有以下代码的sub，但它不起作用。你能帮忙解决我的代码有什么问题吗？

brand (column name)
Channel
Gucci
Channel
LV
LV

Answer 1

我们可以使用word

中的stringr

library(stringr)
word(df$brand, 1)
#[1] "Channel" "Gucci"   "Channel" "LV"      "LV"

Answer 2

这应该这样做。

dat <- data.frame(Brand = c('Channel clothes',
                           'Gucci perfume',
                           'Channel shoes',
                           'LV purses',
                           'LV scarves'))
brand <- sub('(^\\w+)\\s.+','\\1',dat$Brand)
#[1] "Channel" "Gucci"   "Channel" "LV"      "LV"

Answer 3

我更喜欢tidyverse方法。

使用此数据集：

library(tidyverse)

df <- tribble(
  ~brand,
  "Channel clothes",
  "Gucci perfume",
  "Channel shoes",
  "LV purses",
  "LV scarves"
)

我们可以使用以下内容分隔列：

df %>% 
  separate(brand, into = c("brand", "item"), sep = " ")

返回：

# A tibble: 5 x 2
    brand    item
*   <chr>   <chr>
1 Channel clothes
2   Gucci perfume
3 Channel   shoes
4      LV  purses
5      LV scarves

提取第一个单词

3 个答案: