我想在第一个空格中拆分字段“Fare_class”并删除任何字段。我知道存在类似的问题,但是当我尝试这种方法时,它会删除除“Fare_Class”之外的所有字段。
Travel_class Fare_class Avios_awarded
First Flexible F 300% of miles flown
First Lowest A 250% of miles flown
Business Flexible J, C, D 250% of miles flown
Business Lowest R, I 150% of miles flown
以下是我要创建的表格。首先拆分“Fare_class” 空间分为两个新领域“票价”和“预订”。
Travel_class Fare_class Fare Booking Avios_awarded
First Flexible F Flexible F 300% of miles flown
First Lowest A Lowest A 250% of miles flown
Business Flexible J, C, D Flexible J,C,D 250% of miles flown
Business Lowest R, I Lowest R,I 150% of miles flown
答案 0 :(得分:2)
备选方案1:
library(stringr)
str_split_fixed(Fare_class, " ", 2)
# [,1] [,2]
#[1,] "Flexible" "F"
#[2,] "Lowest" "A"
#[3,] "Flexible" "J, C, D"
#[4,] "Lowest" "R, I"
备选方案2:
library(reshape2)
colsplit(Fare_class," ",c("Fare", "Booking"))
# Fare Booking
#1 Flexible F
#2 Lowest A
#3 Flexible J, C, D
#4 Lowest R, I
答案 1 :(得分:0)
library(stringr)
Fare_class <- c('Flexible F',
'Lowest A',
'Flexible J, C, D',
'Lowest R, I')
fare <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 1)
class <- sapply(str_split(Fare_class, sep=' ', n=2), '[[', 2)
str_split用于将字符串拆分为(n =)2个。 str_split的输出是2元素向量的列表。 sapply(...,&#39; [[&#39;,)用于返回每个列表元素的第一个/第二个子元素。
答案 2 :(得分:0)
以下是separate
tidyr
的解决方案,可以通过正则表达式拆分列:
library(tidyr)
separate(df, Fare_class, c("Fare", "Booking"), sep = "\\b\\s\\b", remove = FALSE)
或使用extract
将更复杂的模式拆分为捕获组:
extract(df, Fare_class, c("Fare", "Booking"), regex = "(^\\p{L}+\\b)\\s(.+$)", remove = FALSE)
<强>结果:强>
Travel_class Fare_class Fare Booking Avios_awarded
1 First Flexible F Flexible F 300% of miles flown
2 First Lowest A Lowest A 250% of miles flown
3 Business Flexible J, C, D Flexible J, C, D 250% of miles flown
4 Business Lowest R, I Lowest R, I 150% of miles flown
注意:强>
如果您不想保留原始列Fare_class
,只需从remove = FALSE
或separate
删除extract
。
数据:强>
df = structure(list(Travel_class = structure(c(2L, 2L, 1L, 1L), .Label = c("Business",
"First"), class = "factor"), Fare_class = structure(c(1L, 3L,
2L, 4L), .Label = c("Flexible F", "Flexible J, C, D", "Lowest A",
"Lowest R, I"), class = "factor"), Avios_awarded = structure(c(4L,
1L, 3L, 2L), .Label = c(" 250% of miles flown", "150% of miles flown",
"250% of miles flown", "300% of miles flown"), class = "factor")), .Names = c("Travel_class",
"Fare_class", "Avios_awarded"), class = "data.frame", row.names = c(NA,
-4L))