我试图将广泛的数据集转换为长整齐的数据集。我使用tidyr::gather()
函数来完成这些任务,现在我只有一个非常奇怪的数据集。
以下是我的小版本。正如您可以想象的那样,在__1
后面的列重复到数字__16
或我实际数据帧中的某些内容。是否可以使用tidyr
或dplyr
工具进行修复?
# A tibble: 1 x 10
code city party_short party_long votes seats party_short__1 party_long__1 votes__1 seats__1
<dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 3630 Amsterdam PVDA Partij van de Arbeid 1833 5.00 HARLBEL Harlinger Belang 942 2.00
对于可重复性:
library(tidyverse)
df <- tibble(code = 3630,
city = "Amsterdam",
party_short = "PVDA",
party_long = "Partij van de Arbeid",
votes = 1833,
seats = 5,
party_short__1 = "HARLBEL",
party_long__1 = "Harlinger Belang",
votes__1 = 942,
seats__1 = 2)
具有所需的输出:
# A tibble: 2 x 6
code city party_short party_long votes seats
<dbl> <chr> <chr> <chr> <dbl> <dbl>
1 3630 Amsterdam PVDA Partij van de Arbeid 1833 5.00
2 3630 Amsterdam HARLBEL Harlinger Belang 942 2.00
答案 0 :(得分:1)
我们可以收集所有列,根据&#34; __&#34;分隔列,然后展开数据框。
library(tidyverse)
df2 <- df %>%
gather(Column, Value, -code, -city) %>%
separate(Column, into = c("Column", "Number"), sep = "__") %>%
spread(Column, Value) %>%
select(-Number)
df2
# # A tibble: 2 x 6
# code city party_long party_short seats votes
# <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 3630. Amsterdam Harlinger Belang HARLBEL 2 942
# 2 3630. Amsterdam Partij van de Arbeid PVDA 5 1833
答案 1 :(得分:0)
我正在使用data.table
和tidyr
下面的组合
library(data.table)
library(tidyr)
setDT(df)
melt(df, id.vars = c('code', 'city')) %>% separate(variable, c('vv', 'bb'), '__') %>%
dcast(code + city + bb ~ vv, value.var = 'value') %>% mutate(bb = NULL)
code city party_long party_short seats votes
1 3630 Amsterdam Harlinger Belang HARLBEL 2 942
2 3630 Amsterdam Partij van de Arbeid PVDA 5 1833