在第一个空格中拆分字符字段而不删除r中的字段

时间:2017-11-19 21:45:06

标签: r regex

我想在第一个空格中拆分字段“Fare_class”并删除任何字段。我知道存在类似的问题,但是当我尝试这种方法时,它会删除除“Fare_Class”之外的所有字段。

Travel_class    Fare_class          Avios_awarded      
First           Flexible F        300% of miles flown       
First           Lowest A          250% of miles flown              
Business     Flexible J, C, D     250% of miles flown    
Business       Lowest R, I        150% of miles flown             

以下是我要创建的表格。首先拆分“Fare_class”     空间分为两个新领域“票价”和“预订”。

Travel_class    Fare_class       Fare       Booking      Avios_awarded      
First            Flexible F      Flexible     F      300% of miles flown       
First            Lowest A        Lowest       A      250% of miles flown              
Business      Flexible J, C, D   Flexible   J,C,D   250% of miles flown    
Business        Lowest R, I      Lowest      R,I    150% of miles flown   

3 个答案:

答案 0 :(得分:2)

备选方案1:

library(stringr)
str_split_fixed(Fare_class, " ", 2)

#     [,1]        [,2]     
#[1,] "Flexible"  "F"      
#[2,] "Lowest"    "A"      
#[3,] "Flexible"  "J, C, D"
#[4,] "Lowest"    "R, I" 

备选方案2:

library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))

#      Fare  Booking
#1 Flexible        F
#2   Lowest        A
#3 Flexible  J, C, D
#4   Lowest     R, I

答案 1 :(得分:0)

library(stringr)

Fare_class <- c('Flexible F',
 'Lowest A',
 'Flexible J, C, D',
 'Lowest R, I')

fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)

str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,&#39; [[&#39;,)用于返回每个列表元素的第一个/第二个子元素。

答案 2 :(得分:0)

以下是separate tidyr的解决方案,可以通过正则表达式拆分列:

library(tidyr)

separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)

或使用extract将更复杂的模式拆分为捕获组:

extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)

<强>结果:

  Travel_class       Fare_class     Fare Booking        Avios_awarded
1        First       Flexible F Flexible       F  300% of miles flown
2        First         Lowest A   Lowest       A  250% of miles flown
3     Business Flexible J, C, D Flexible J, C, D  250% of miles flown
4     Business      Lowest R, I   Lowest    R, I  150% of miles flown

注意:

如果您不想保留原始列Fare_class,只需从remove = FALSEseparate删除extract

数据:

df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business", 
"First"), class = "factor"), Fare_class = structure(c(1L, 3L, 
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A", 
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L, 
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown", 
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class", 
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA, 
-4L))