How to seperate a strings column based on first whitespace

时间:2017-05-16 09:24:44

标签: r string substring grepl

This is my data .

mtcars$brand=row.names(mtcars)
    mtcars$brand
    ##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
    ##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
    ##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
    ## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
    ## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
    ## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
    ## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
    ## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
    ## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
    ## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
    ## [31] "Maserati Bora"       "Volvo 142E"

I want to extract only the first name of brand (a substring till the first whitespace)

like this - how to do it

##                               brand    brand2
## Mazda RX4                 Mazda RX4     Mazda 
## Mazda RX4 Wag         Mazda RX4 Wag     Mazda 
## Datsun 710               Datsun 710     Datsun 
## Hornet 4 Drive       Hornet 4 Drive     Hornet 
## Hornet Sportabout Hornet Sportabout     Hornet 
## Valiant                     Valiant     Valiant
## Duster 360               Duster 360     Duster 
## Merc 240D                 Merc 240D     Merc 
## Merc 230                   Merc 230     Merc 
## Merc 280                   Merc 280     Merc

These are the four possible ways

> mtcars$brand2=gsub( " .*$", "",mtcars$brand)
> mtcars$brand3=sapply(strsplit(mtcars$brand, "\\s+"), "[", 1)

> mtcars$brand4=substr(mtcars$brand,1,regexpr(" ",mtcars$brand))
> mtcars$brand4=ifelse(mtcars$brand4=="",mtcars$brand,mtcars$brand4)

 library(stringr)
> mtcars$brand5=str_extract(mtcars$brand, boundary("word"))

2 个答案:

答案 0 :(得分:2)

您可以使用str_extract

library("stringr")
brand <- row.names(mtcars)
str_extract(brand, boundary("word"))
>[1] "Mazda"    "Mazda"    "Datsun"   "Hornet"   "Hornet"   "Valiant" 
>[7] "Duster"   "Merc"     "Merc"     "Merc"     "Merc"     "Merc"    
>[13] "Merc"     "Merc"     "Cadillac" "Lincoln"  "Chrysler" "Fiat"    
>[19] "Honda"    "Toyota"   "Toyota"   "Dodge"    "AMC"      "Camaro"  
>[25] "Pontiac"  "Fiat"     "Porsche"  "Lotus"    "Ford"     "Ferrari" 
>[31] "Maserati" "Volvo"

答案 1 :(得分:1)

strsplit

unlist(lapply(strsplit(rownames(mtcars)," "),function(x) x[1]))

or gsub

gsub("^([A-Za-z]+) .*","\\1",rownames(mtcars),perl=TRUE)