我有一个具有以下结构的数据框:
library(tidyverse)
df <- tibble::tribble(
~var1, ~var2, ~var3,
"ano 2005", "km 128000", "marca chevrolet",
"ano 2019", "marca hyundai tucson", "km 50000",
"marca grand vitara sz", "ano 2012", "NA"
)
我需要使用以下代码创建新变量,并为此分配相应的信息
df %>%
stack() %>%
select(-ind) %>%
separate(values, into = c("column", "value")) %>%
pivot_wider(value, column, values_fn = list(value = list)) %>%
unnest(cols = c(marca, ano, km))
但不适用,我遇到以下错误:marca
,大小为120,km
,大小为119,没有通用大小。
还有另一个错误,它仅返回第一个单词,其余的单词将其消除。
如果有人可以帮助我,我将非常感谢
ano marca km
2005 chevrolet 128000
2019 hyundai 50000
2012 grand
答案 0 :(得分:1)
这是我的data.table
方法
library( data.table )
#set to data.table format
setDT(df)
#create row_id's
df[, id := .I][]
#melt to long
ans <- melt( df, id.vars = "id" )
#split strings, using first space as separator
ans[, c("col_name", "col_value") := as.data.table( stringr::str_split_fixed( value, " ", 2 ) ) ]
#cast to wide
dcast( ans[!col_name == "NA",], id ~ col_name, value.var = "col_value")
# id ano km marca
# 1: 1 2005 128000 chevrolet
# 2: 2 2019 50000 hyundai tucson
# 3: 3 2012 <NA> grand vitara sz
答案 1 :(得分:0)
这是一个tidyverse
解决方案。这要求数据框中的NA
值为NA_character_
,而不是带引号的字符串"NA"
。参数extra='merge'
告诉separate()
不要在汽车型号名称中添加其他单词。
# replace quoted NA with true NA
df[df == 'NA'] <- NA_character_
df %>%
mutate(id = 1:nrow(df)) %>%
pivot_longer(-id) %>%
separate(value, into = c('column', 'value'), extra = 'merge') %>%
select(-name) %>%
filter(!is.na(column)) %>%
pivot_wider(id_cols = id, names_from = column)
# A tibble: 3 x 4
# id ano km marca
# <int> <chr> <chr> <chr>
#1 1 2005 128000 chevrolet
#2 2 2019 50000 hyundai tucson
#3 3 2012 NA grand vitara sz