我的数据包含以下格式存储的变量:
V2 V3
1 Price : 33,990 Size : 16, 17 & 18.5"
2 Price : 30,830 Size : 13, 16, 18 & 19.5"
3 Price : 48,560 Sizes : 21 & 21.5"
4 Price : 33,790 Size : 17.5, 18.5, 19.5 & 21.5
5 Price : 37,990 Size : 17.5, 18.5 & 19.5
6 Price : 43,690 Size : 17.5, 18.5 & 19.5"
我需要的变量是Price
和Size
等。 R中最简洁的方法是将原始数据转换为如下格式:
Price Size
1 33,990 16, 17 & 18.5"
2 30,830 13, 16, 18 & 19.5"
3 48,560 21 & 21.5"
4 33,790 17.5, 18.5, 19.5 & 21.5
5 37,990 17.5, 18.5 & 19.5
6 43,690 17.5, 18.5 & 19.5"
此外,第三行的变量名拼写错误为Sizes
而不是Size
。我怎么能处理这个问题,因为有其他变量具有相同的错误?
修改
我不能使用列特定策略(例如,使用gsub()
),因为给定列中的变量不一致。具体地,
V20
1 Grips : Bontrager SSR
2 Headset : 1-1/8" threadless
3
4 Brakeset : Tektro alloy linear-pull
5 Brakeset : HL 280 mechanical disc
6 Brakeset : Tektro M290 hydraulic disc brakes
列V20有3个唯一变量,Grips
,Headset
,Brakeset
和空白。整洁的数据框应该类似于:
Grips Headset Brakeset
1 Bontrager SSR NA NA
2 NA 1-1/8" threadless NA
3 NA NA NA
4 NA NA Tektro alloy linear-pull
5 NA NA HL 280 mechanical disc
6 NA NA Tektro M290 hydraulic disc brakes
这是过于简单化,因为我假设Brakeset
没有前3行的值。这可能是也可能不是,因为该值可以存储在不同的列中。如果特定行没有给定变量的值,则使用NA。我希望这个问题很清楚。
答案 0 :(得分:3)
library(tidyr)
# convert = T automatically converts to integer/numeric
df$Price <- separate(df, Price, into = c("x","y"), sep = ":", convert = T)[,2]
df$Size <- separate(df, Size, into = c("x","y"), sep = ":")[,2]
# with gsub()
# irrespective of what is appearing before ":", gsub() shall take care of it
df$Price <- trimws(gsub(".*\\:", "",df$Price)) # this should work
# I'm using the below data to explain. This is obtained after using separate() once.
df1
x y
1 Grips Bontrager SSR
2 Grips Bontrager SSR
3 Headset 1-1/8 threadless
4 Brakeset Tektro M290 hydraulic disc brakes
# need to add a unique key to the data
> df1[["id"]] <- 1:nrow(df1)
> df1
x y id
1 Grips Bontrager SSR 1
2 Grips Bontrager SSR 2
3 Headset 1-1/8 threadless 3
4 Brakeset Tektro M290 hydraulic disc brakes 4
# using spread() from tidyr package
> spread(df1, x, y)
id Brakeset Grips Headset
1 1 <NA> Bontrager SSR <NA>
2 2 <NA> Bontrager SSR <NA>
3 3 <NA> <NA> 1-1/8 threadless
4 4 Tektro M290 hydraulic disc brakes <NA> <NA>