根据在多列中具有变量的杂乱数据创建新变量

时间:2020-02-21 15:11:21

标签: r

我有一个具有以下结构的数据框:`

   var1               var2                var3   

año: 2005          km: 128000           marca: chevrolet         
año: 2019          marca: hyundai       km: 50000   
marca: toyota      año: 2012            km: 340000

` 我需要在分配了相应信息的地方创建新变量

 año             marca            km  

2005           chevrolet        128000
2019           hyundai          50000   
2012           toyota           340000

如果有人可以为此目的帮助我,我会喜欢的。

3 个答案:

答案 0 :(得分:0)

library(tidyverse)

df <- tibble::tribble(
              ~var1,            ~var2,              ~var3,
        "ano: 2005",     "km: 128000", "marca: chevrolet",
        "ano: 2019", "marca: hyundai",        "km: 50000",
    "marca: toyota",      "ano: 2012",       "km: 340000"
    )


df %>% 
    stack() %>% 
    select(-ind) %>% 
    separate(values, into = c("column", "value")) %>% 
    pivot_wider(value, column, values_fn = list(value = list)) %>% 
    unnest(cols = c(ano, marca, km))
#> # A tibble: 3 x 3
#>   ano   marca     km    
#>   <chr> <chr>     <chr> 
#> 1 2005  toyota    128000
#> 2 2019  hyundai   50000 
#> 3 2012  chevrolet 340000

答案 1 :(得分:0)

这是基本的R代码

pat <- c("ano","marca","km")
dfout <- setNames(data.frame(t(apply(df,
                                     1,
                                     function(v) trimws(gsub(".*:","",v))[match(gsub(":.*","",v),pat)]))),pat)

这样

> dfout
   ano     marca     km
1 2005 chevrolet 128000
2 2019   hyundai  50000
3 2012    toyota 340000

数据

df <- structure(list(var1 = c("ano: 2005", "ano: 2019", "marca: toyota"
), var2 = c("km: 128000", "marca: hyundai", "ano: 2012"), var3 = c("marca: chevrolet", 
"km: 50000", "km: 340000")), class = "data.frame", row.names = c(NA, 
-3L))

答案 2 :(得分:0)

使用purrrdplyrtidyr解决此问题的一种方法可能是:

map_dfr(.x = split.default(df, 1:length(df)), 
        ~ .x %>%
         mutate(rowid = row_number()) %>%
         separate(1, sep = ": ", into = c("column", "variable"))) %>%
 pivot_wider(names_from = "column", values_from = "variable")

  rowid ano   marca     km    
  <int> <chr> <chr>     <chr> 
1     1 2005  chevrolet 128000
2     2 2019  hyundai   50000 
3     3 2012  toyota    340000