如何根据另一列字符串提取部分字符串?

时间:2019-02-06 20:25:36

标签: r string stringr

我正在转换调度系统的输出。我想从一个初始列创建2个新列。第一个所需的列具有从初始列开始的前六个字符。所需的第二列应包含空白行和部分字符串匹配项的组合。我是初学者。欢迎使用资源,在我犯无知的语法错误的地方。

初始数据文件如下:

# A tibble: 6 x 2
  `Schedule Title` `Staff or Resource Name`                                
  <chr>            <chr>                                                   
1 Consultation     BIO210 (Bio stat); Carl; LSP143 (computer lab)          
2 Weekly           PHY111; (Physics I); Noah/Prof Stubbin                  
3 Weekly           CHM111 (Gen Chem); Ali/Prof Van Arman                   
4 Workshops        CHM111 Quant Skills Workshop, KAU104                    
5 Workshops        CPS111 Study Jam (Computer Science)                     
6 Workshops        CHM211 Organic Chem Study Tips from Q&SC Tutors, HARWOOD

所需文件:

# A tibble: 6 x 3
  `Schedule Title` `Course` `WorkshopName`                            
  <chr>            <chr>  <chr>                                   
1 Consultation     BIO210                                      
2 Weekly           PHY111                                      
3 Weekly           CHM111                                     
4 Workshops        CHM111 Quant Skills Workshop                   
5 Workshops        CPS111 Study Jam                               
6 Workshops        CHM211 Organic Chem Study Tips from Q&SC Tutors

使用str_sub创建“课程”没有问题。

但是,我无法成功创建“ WorkshopName”。我尝试过a)按“类型”过滤,然后提取8字尾字符。我尝试过b)提取8字符,然后将, ;\(之后的任何内容替换为空文本。两种方法(即使它们可行)也只是部分解决方案。

#1. This works and creates the Course column
QSC$Course <- str_sub(QSC$Staff.or.Resource.Name, start = 1, end = 6)

#2. This does not work.  I was trying to filter by 'Type', then create WorkshopName for only those of Type: Workshop. I would still need to clean up the WorkshopNames to eliminate everything after a , ; or (.
QSC %>%
filter(str_detect(Type, 'Workshops') ) %>%
WorkshopName = str_sub('Staff or Resource Name', start = 8, end = -1)

#3. This also does not work.  I tried to extract characters 8-end, then replace anything after , ; or (.  
#I have not been able to successfully escape the character (. 
#I haven't even gotten to the part where I intended to filter and replace the strings with blanks for all but Type:Workshop.
QSC$WorkshopName <- str_sub(QSC$`Staff or Resource Name`, start = 8, end = -1)
QSC$WorkshopName <- str_split(QSC$WorkshopName,",", 1)
QSC$WorkshopName <- str_split(QSC$WorkshopName,";", 1)
QSC$WorkshopName <- str_split(QSC$WorkshopName,"\(", 1)
  1. 创建所需的“课程”列。
  2. 得出

    Error in QSC %>% WorkshopName = str_sub("Staff or Resource Name", start = 8, : could not find function "%>%<-"

  3. 得出

    Error: '\(' is an unrecognized escape in character string starting ""\("

1 个答案:

答案 0 :(得分:0)

QSC %>%
filter(str_detect(Type, 'Workshops') ) %>%
WorkshopName = str_sub('Staff or Resource Name', start = 8, end = -1)

之所以失败,是因为您需要使用mutate来创建一个dyplr的新变量:

QSC %>%
filter(str_detect(Type, 'Workshops') ) %>%
mutate(WorkshopName = str_sub('Staff or Resource Name', start = 8, end = -1))

此外,要获得所需的评论,可以使用ifelse

QSC %>%
mutate(WorkshopName = if_else(str_detect(Type, 'Workshops'), str_sub('Staff or Resource Name', start = 8, end = -1), NA))

现在:

QSC$WorkshopName <- str_split(QSC$WorkshopName,"\(", 1)

失败,因为您需要一个额外的\来进行R解释(正确:

QSC$WorkshopName <- str_split(QSC$WorkshopName,"\\(", 1)

最后,所有三个str_split中的最后一个参数为1 ...我认为这没有任何意义。 如果您阅读?str_split

  

n:要退回的件数。默认(Inf)使用所有可能的分割   职位

因此,如果您输入1,它将返回整个字符串,您需要2或更多。如果您想要的是获得拆分后的第一块作品,那么您需要这样做,以便在(第一次)拆分之前保留该作品,并丢弃之后的作品

str_split(QSC$WorkshopName, "\\(")[[1]][1]

最佳