R pivot_wider 所以重复的行成为标题

时间:2021-04-05 13:04:58

标签: r dplyr

我正在尝试转换长数据,以便重复行值成为标题。数据如下所示:

# A tibble: 12 x 2
   x1       x2           
   <chr>    <chr>        
 1 Position 1            
 2 Name     Jon Ellis    
 3 Sex      m            
 4 Year     2017         
 5 Category Open         
 6 Time     06:37:27     
 7 Position 2            
 8 Name     Craig Holgate
 9 Sex      m            
10 Year     2015         
11 Category Open         
12 Time     06:43:45 

我希望我的重复行值(“Position”、“Name”、“Sex”、“Year”、“Category”、“Time”)成为标题,但尽管进行了多次尝试还没有弄清楚如何传播/旋转数据以实现这一目标。感谢指点,谢谢。

structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category", 
"Time", "Position", "Name", "Sex", "Year", "Category", "Time", 
"Position", "Name", "Sex", "Year", "Category", "Time", "Position", 
"Name", "Sex", "Year", "Category", "Time"), x2 = c("1", "Jon Ellis", 
"m", "2017", "Open", "06:37:27", "2", "Craig Holgate", "m", "2015", 
"Open", "06:43:45", "3", "Stuart Leaney", "m", "2018", "Open", 
"06:46:03", "4", "Craig Holgate", "m", "2013", "Open", "06:47:19"
)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))

3 个答案:

答案 0 :(得分:5)

1) dplyr/tidyr 添加分组列row,将长格式转换为宽格式,删除row并转换列类型。

library(dplyr)
library(tidyr)

DF %>%
  mutate(row = cumsum(x1 == "Position")) %>%
  pivot_wider(names_from = x1, values_from = x2) %>%
  select(-row) %>%
  type.convert(as.is = TRUE) 

给予:

# A tibble: 2 x 6
  Position Name          Sex    Year Category Time    
     <int> <chr>         <chr> <int> <chr>    <chr>   
1        1 Jon Ellis     m      2017 Open     06:37:27
2        2 Craig Holgate m      2015 Open     06:43:45

2) Base R 使用字符串操作转换为 Debian 控制文件格式,并使用 read.dcf 创建字符矩阵读取它,转换为数据框并修复类型。

txt <- with(DF, sub("Position", "\nPosition", sprintf("%s: %s", x1, x2)))
type.convert(as.data.frame(read.dcf(textConnection(txt))), as.is = TRUE)

给予:

  Position          Name Sex Year Category     Time
1        1     Jon Ellis   m 2017     Open 06:37:27
2        2 Craig Holgate   m 2015     Open 06:43:45

或者用只需要基数 R 的 Bizarro 管道表示:

DF ->.;
  with(., sub("Position", "\nPosition", sprintf("%s: %s", x1, x2))) ->.;
  textConnection(.) ->.;
  read.dcf(.) ->.;
  as.data.frame(.) ->.;
  type.convert(., as.is = TRUE)

注意

DF <- structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category", 
"Time", "Position", "Name", "Sex", "Year", "Category", "Time"
), x2 = c("1", "Jon Ellis", "m", "2017", "Open", "06:37:27", 
"2", "Craig Holgate", "m", "2015", "Open", "06:43:45")), class = "data.frame", row.names = c(NA, 
-12L))

答案 1 :(得分:4)

假设中间没有任何属性丢失,这将起作用

library(tidyverse)
df %>% pivot_wider(names_from = x1, values_from = x2, values_fn = list) %>%
  unnest(cols = everything())

A tibble: 4 x 6
  Position Name          Sex   Year  Category Time    
  <chr>    <chr>         <chr> <chr> <chr>    <chr>   
1 1        Jon Ellis     m     2017  Open     06:37:27
2 2        Craig Holgate m     2015  Open     06:43:45
3 3        Stuart Leaney m     2018  Open     06:46:03
4 4        Craig Holgate m     2013  Open     06:47:19

答案 2 :(得分:0)

我的更像亲爱的 AnilGoyal 解释,有两个轻微的修改:

library(dplyr)
library(tidyr)

df %>%
  group_by(x1) %>%
  mutate(id = row_number()) %>%      
  pivot_wider(id_cols = id, names_from = x1, values_from = x2)

# A tibble: 4 x 7
     id Position Name          Sex   Year  Category Time    
  <int> <chr>    <chr>         <chr> <chr> <chr>    <chr>   
1     1 1        Jon Ellis     m     2017  Open     06:37:27
2     2 2        Craig Holgate m     2015  Open     06:43:45
3     3 3        Stuart Leaney m     2018  Open     06:46:03
4     4 4        Craig Holgate m     2013  Open     06:47:19


数据:

df <- structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category", 
                      "Time", "Position", "Name", "Sex", "Year", "Category", "Time", 
                      "Position", "Name", "Sex", "Year", "Category", "Time", "Position", 
                      "Name", "Sex", "Year", "Category", "Time"), x2 = c("1", "Jon Ellis", 
                                                                         "m", "2017", "Open", "06:37:27", "2", "Craig Holgate", "m", "2015", 
                                                                         "Open", "06:43:45", "3", "Stuart Leaney", "m", "2018", "Open", 
                                                                         "06:46:03", "4", "Craig Holgate", "m", "2013", "Open", "06:47:19"
                      )), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
                      ))