使用R将储存格文字分割成各种以逗号分隔的储存格

时间:2018-10-08 12:48:35

标签: r

我有一个数据框,

Office 365,MS SQL Server,ASP.NET

Microsoft Azure,ITIL,Project Management

Infrastructure services,AWS solution architect

需要使用逗号分隔此单元格,如下所示

Office 365                 MS SQL Server                ASP.NET

Microsoft Azure            ITIL                      Project Management

Infrastructure services    AWS solution architect     NA

3 个答案:

答案 0 :(得分:1)

使用dplyr这样的事情怎么样:

library(dplyr) 
df %>% separate(text,c("a","b","c"),sep =",",remove =FALSE) %>% select(-1)

                        a                      b                  c
1              Office 365          MS SQL Server            ASP.NET
2         Microsoft Azure                   ITIL Project Management
3 Infrastructure services AWS solution architect               <NA>

由于我们没有您的数据,我们假设您有一个data.frame,而您想要一个data.frame作为结果:

df <- data.frame(text = c("Office 365,MS SQL Server,ASP.NET",
                              "Microsoft Azure,ITIL,Project Management",
                             "Infrastructure services,AWS solution architect"))

答案 1 :(得分:1)

这是一个不错的解决方案,可以完成工作。我敢肯定,其他人也会为您带来更简洁的方法。

  • 本质上,您可以在gsub()内使用一些正则表达式来提取第一个逗号之前的值。
  • 完成此操作后,删除第一个逗号之前的所有内容,然后重复此过程以提取第二个逗号之前的值。
  • 根据需要重复多次。

    #Load packages
    library(dplyr)
    library(stringr)
    
    #Replicating your dataset
    df<-data.frame(Strings=
               c("Office 365,MS SQL Server,ASP.NET",
                 "Microsoft Azure,ITIL,Project Management",
                 "Infrastructure services,AWS solution architect"))
    
    #Extract sting before the first comma
        df<-mutate(df, FirstComma = gsub(",.*$", "", Strings))
    
    #Extract string between first & second commas
    
        #Create a vector identifying end position of First String
        df$EndPosOf1stStr<-str_locate(df$Strings,df$FirstComma)[,2] 
    
        #Extract string between first & second comma
        df<-mutate(df, STRWithoutFirst = substring(Strings,EndPosOf1stStr+2), 
                   SecondComma = gsub(",.*$", "", STRWithoutFirst))
    
    #Extract value after second comma
    
        #Create a vector identifying end position of Second String
        df$EndPosOf2ndStr<-str_locate(df$Strings,df$SecondComma)[,2]
    
        #Extract string after second comma
        df<-mutate(df,STRWithoutFirstSecond = substring(Strings,EndPosOf2ndStr+2),
          ThirdComma = gsub(",.*$", "", STRWithoutFirstSecond))
    
    #Keep variables of interest
    df<-select(df, Strings, FirstComma, SecondComma, ThirdComma)
    
    print(df)
    

答案 2 :(得分:0)

install.packages("splitstackshape")
library(splitstackshape)

New_Data = concat.split( Old_Data  , split.col=1, sep = "," , structure = "compact", mode = NULL, type = NULL, drop = FALSE, fixed = FALSE, fill = NA) 

这对我来说很完美。