将数据宽格式数据帧转换为r中数据帧中的长格式

时间:2021-06-28 09:29:58

标签: r

我有一个宽格式数据框 abc.csv 有变量 ID,pc_2007-pc_2011(这些中的值是不同年份的邮政编码)和 rd_2007-rd_2011(这些中的值是每年的审查日期。

<头>
ID pc_2007 pc_2008 pc_2009 pc_2010 pc_2011 rd_2007 rd_2008 rd_2009 rd_2010 rd_2011
A 1 4 7 10 13 16 19 22 25 28
B 2 5 8 11 14 17 20 23 26 29
C 3 6 9 12 15 18 21 24 27 30

我想将此数据帧转换为长格式

<头>
ID pc rd
A 2007 1 16
A 2008 4 19
A 2009 7 22

3 个答案:

答案 0 :(得分:2)

您可以在 names_sep 中使用 pivot_longer

df2 <- tidyr::pivot_longer(df1, 
                    cols = -ID, 
                    names_to = c('.value', 'year'), 
                    names_sep = '_')

df2
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30

答案 1 :(得分:0)

请看下面的建议

    # your data
df <- structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
                     pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
                     rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), class = "data.frame", row.names = c(NA, 
                                                                                                                              -3L))
# packages needed
library(dplyr)
library(tidyr)
library(stringr)

# suggestion
df %>% 
  # your columns names are difficult to work with, I propose you use a "transition" table and use 
  # regular expressions to select elements you need ...
  pivot_longer(cols = 2:last_col(), names_to = "year_code", values_to = "value") %>% 
  mutate(year = str_extract(year_code, "[0-9]{4}$"),
         code = str_extract(year_code, "^[a-z]{2}")) %>% 
  select(-year_code) %>% 
  # ...and then pivot your table back
  pivot_wider(names_from = code, values_from = value)

输出:

 ID    year     pc    rd
   <chr> <chr> <int> <int>
 1 A     2007      1    16
 2 A     2008      4    19
 3 A     2009      7    22
 4 A     2010     10    25
 5 A     2011     13    28
 6 B     2007      2    17
 7 B     2008      5    20
 8 B     2009      8    23
 9 B     2010     11    26
10 B     2011     14    29
11 C     2007      3    18
12 C     2008      6    21
13 C     2009      9    24
14 C     2010     12    27
15 C     2011     15    30

答案 2 :(得分:0)

这是 tidyverse 的一种方式,其中 pivot_longer 后跟 pivot_wider

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(
    cols = -ID,
    names_to = c("name", "year"),
    names_sep = "_"
  ) %>%
  pivot_wider(
    id_cols = c(ID, year),
    names_from = name,
    values_from = value
  )
## A tibble: 15 x 4
#   ID    year     pc    rd
#   <chr> <chr> <int> <int>
# 1 A     2007      1    16
# 2 A     2008      4    19
# 3 A     2009      7    22
# 4 A     2010     10    25
# 5 A     2011     13    28
# 6 B     2007      2    17
# 7 B     2008      5    20
# 8 B     2009      8    23
# 9 B     2010     11    26
#10 B     2011     14    29
#11 C     2007      3    18
#12 C     2008      6    21
#13 C     2009      9    24
#14 C     2010     12    27
#15 C     2011     15    30

dput 格式的数据

df1 <- 
structure(list(ID = c("A", "B", "C"), pc_2007 = 1:3, pc_2008 = 4:6, 
    pc_2009 = 7:9, pc_2010 = 10:12, pc_2011 = 13:15, rd_2007 = 16:18, 
    rd_2008 = 19:21, rd_2009 = 22:24, rd_2010 = 25:27, rd_2011 = 28:30), 
    class = "data.frame", row.names = c(NA, -3L))