我的数据如下:
Year Categories January February March April May June July August September October November December 1 1990 A 4564.0 465465.0 12 468 4884.0 12788.00 4218.00 -58445.86 -90643.00 -122840.1 -155037.29 -187234.4286 2 1990 B 6487.0 421214.0 878 2112 421283.0 56456.00 54654.00 515.00 212.00 515.0 212.00 515.0000 3 1990 C 42862.0 512.0 484 48 515.0 212.00 515.00 137858.33 48.00 137858.3 48.00 465.0000 4 1990 D 15.0 -169222.7 90 456 137858.3 48.00 465.00 135673.83 778.00 135673.8 778.00 12.0000 5 1990 E 19164.0 -401699.2 -304 246 135673.8 778.00 12.00 133489.33 57.00 133489.3 57.00 478.0000 6 1991 A 21436.8 -634175.7 -698 36 133489.3 57.00 478.00 131304.83 3.00 131304.8 3.00 331.3333 7 1991 B 23709.6 -866652.2 -1092 -174 131304.8 3.00 -8210.60 129120.33 30425.33 129120.3 -11463.57 337.8333 11 1992 A 32800.8 -1796558.2 -2668 -1014 122566.8 -27597.89 -29087.86 292051.00 82253.33 331147.5 -12728.17 363.8333 12 1992 B 35073.6 -2029034.7 -3062 -1224 120382.3 -32976.00 -34307.17 321333.47 95210.33 367329.4 -14420.56 370.3333 13 1992 C 37346.4 -2261511.2 -3456 -1434 118197.8 -38354.11 -39526.49 350615.94 108167.33 403511.2 -16112.96 376.8333
我想使用tidyverse如下操作此数据框:
首先,每年没有相同数量的类别。即使其他年份没有特定类别,也应显示所有其他类别。因为您看到90年代有5个类别,而91年代只有2个类别。
在这种情况下,应该并排查看几个月的数据,而不是逐行查看。因此,通过以下方式; 1月90日,2月90日,...,12月90日,1月91日,2月91日,..,12月91日,1月92日,...,12月92日(这些将显示为列名)。
我希望以此方式在专栏中看到它。年份应删除,并且唯一的类别应显示在最左列(类别下)。之后,如果某个类别不是特定于一年中的某个月份的,这意味着该月没有数据,则该月的以下月份可以为“ 0”。
为此,我想在R中使用tidyverse,但如果您能帮助我,我将无法将其编写为代码。
这是数据的预期版本,但正如我所说的那样,月份应该并排放置:
Categories Jan.90 Feb.90 Mar.90 Apr.90 May.90 June.90 July.90 Aug.90 Sep.90 Oct.90 Nov.90 Dec.90 Jan.91 Feb.91 Mar.91 1 A 4564 465465.0 12 468 4884.0 12788 4218 -58445.86 -90643 -122840.1 -155037.3 -187234.4 21436.8 -634175.7 -698 2 B 6487 421214.0 878 2112 421283.0 56456 54654 515.00 212 515.0 212.0 515.0 23709.6 -866652.2 -1092 3 C 42862 512.0 484 48 515.0 212 515 137858.33 48 137858.3 48.0 465.0 0.0 0.0 0 4 D 15 -169222.7 90 456 137858.3 48 465 135673.83 778 135673.8 778.0 12.0 0.0 0.0 0 5 E 19164 -401699.2 -304 246 135673.8 778 12 133489.33 57 133489.3 57.0 478.0 0.0 0.0 0 Apr.91 May.91 June.91 July.91 Aug.91 Sep.91 Oct.91 Nov.91 Dec.91 Jan.92 Feb.92 Mar.92 Apr.92 May.92 June.92 July.92 1 36 133489.3 57 478.0 131304.8 3.00 131304.8 3.00 331.3333 32800.8 -1796558 -2668 -1014 122566.8 -27597.89 -29087.86 2 -174 131304.8 3 -8210.6 129120.3 30425.33 129120.3 -11463.57 337.8333 35073.6 -2029035 -3062 -1224 120382.3 -32976.00 -34307.17 3 0 0.0 0 0.0 0.0 0.00 0.0 0.00 0.0000 37346.4 -2261511 -3456 -1434 118197.8 -38354.11 -39526.49 4 0 0.0 0 0.0 0.0 0.00 0.0 0.00 0.0000 0.0 0 0 0 0.0 0.00 0.00 5 0 0.0 0 0.0 0.0 0.00 0.0 0.00 0.0000 0.0 0 0 0 0.0 0.00 0.00 Aug.92 Sep.92 Oct.92 Nov.92 Dec.92 1 292051.0 82253.33 331147.5 -12728.17 363.8333 2 321333.5 95210.33 367329.4 -14420.56 370.3333 3 350615.9 108167.33 403511.2 -16112.96 376.8333 4 0.0 0.00 0.0 0.00 0.0000 5 0.0 0.00 0.0 0.00 0.0000
答案 0 :(得分:4)
您可以首先将数据gather
group_by
转换为长格式,Year
complete
和Categories
丢失的unite
。然后,我们使用spread
组合月份和年份组合,最后library(tidyverse)
df %>%
gather(key, value, -Year, -Categories) %>%
group_by(Year) %>%
complete(Categories) %>%
unite(MonthYear, key, Year) %>%
spread(MonthYear, value, fill = 0)
# Categories April_1990 April_1991 April_1992 August_1990 ....
# <fct> <dbl> <dbl> <dbl> <dbl> ....
#1 A 468 36 -1014 -58446. ....
#2 B 2112 -174 -1224 515 ....
#3 C 48 0 -1434 137858. ....
#4 D 456 0 0 135674. ....
#5 E 246 0 0 133489. ....
通过将空值填充为0将其组合为宽格式。
df %>%
gather(key, value, -Year, -Categories) %>%
group_by(Year) %>%
complete(Categories) %>%
unite(MonthYear, key, Year) %>%
mutate(MonthYear = factor(MonthYear, levels = unique(MonthYear))) %>%
spread(MonthYear, value, fill = 0)
# Categories January_1990 February_1990 March_1990 April_1990 ....
# <chr> <dbl> <dbl> <dbl> <dbl> ....
#1 A 4564 465465 12 468 ....
#2 B 6487 421214 878 2112 ....
#3 C 42862 512 484 48 ....
#4 D 15 -169223. 90 456 ....
#5 E 19164 -401699. -304 246 ....
如果我们要保持列的顺序,一种简单的方法是将它们转换为因数
MonthYear
编辑
如OP对真实数据的评论中所述,它们会出现重复的标识符错误,因为我们可以在传播前为每个df %>%
gather(key, value, -Year, -Categories) %>%
group_by(Year) %>%
complete(Categories) %>%
unite(MonthYear, key, Year) %>%
mutate(MonthYear = factor(MonthYear, levels = unique(MonthYear))) %>%
group_by(MonthYear) %>%
mutate(i = row_number()) %>%
spread(MonthYear, value) %>%
ungroup() %>%
select(-i)
创建一个唯一索引
'react-native-device-info'
答案 1 :(得分:0)
如何聚会,然后一年又一个月粘贴在一起,然后传播。我使用一种荒谬的解决方法来保持列的顺序正确。试试:
library(dplyr)
library(tidyr)
df %>%
gather(k, v, -Year, -Categories, -Categories) %>%
arrange(Categories, Year) %>%
group_by(Categories) %>%
mutate(n = row_number(),
col = paste0("n", 1000+n, substr(k, 1, 3), ".", substr(Year, 3, 4))) %>%
ungroup() %>%
arrange(col) %>%
select(-Year, -k, -n) %>%
spread(col, v, fill = 0) %>%
rename_at(vars(-Categories), ~substr(., 6, nchar(.)))
结果
# A tibble: 5 x 49
Categories Jan.90 Feb.90 Mar.90 Apr.90 May.90 Jun.90 Jul.90 Aug.90 Sep.90 Oct.90 Nov.90 Dec.90 Jan.91 Jan.92 Feb.91 Feb.92 Mar.91 Mar.92 Apr.91 Apr.92 May.91
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 4564 4.65e5 12 468 4.88e3 12788 4218 -58446. -90643 -1.23e5 -1.55e5 -1.87e5 21437. 0 -6.34e5 0. -698 0 36 0 1.33e5
2 B 6487 4.21e5 878 2112 4.21e5 56456 54654 515 212 5.15e2 2.12e2 5.15e2 23710. 0 -8.67e5 0. -1092 0 -174 0 1.31e5
3 C 42862 5.12e2 484 48 5.15e2 212 515 137858. 48 1.38e5 4.80e1 4.65e2 0 37346. 0. -2.26e6 0 -3456 0 -1434 0.
4 D 15 -1.69e5 90 456 1.38e5 48 465 135674. 778 1.36e5 7.78e2 1.20e1 0 0 0. 0. 0 0 0 0 0.
5 E 19164 -4.02e5 -304 246 1.36e5 778 12 133489. 57 1.33e5 5.70e1 4.78e2 0 0 0. 0. 0 0 0 0 0.
# … with 27 more variables: May.92 <dbl>, Jun.91 <dbl>, Jun.92 <dbl>, Jul.91 <dbl>, Jul.92 <dbl>, Aug.91 <dbl>, Aug.92 <dbl>, Sep.91 <dbl>, Sep.92 <dbl>, Oct.91 <dbl>,
# Oct.92 <dbl>, Nov.91 <dbl>, Nov.92 <dbl>, Dec.91 <dbl>, Dec.92 <dbl>, Jan.92 <dbl>, Feb.92 <dbl>, Mar.92 <dbl>, Apr.92 <dbl>, May.92 <dbl>, Jun.92 <dbl>, Jul.92 <dbl>,
# Aug.92 <dbl>, Sep.92 <dbl>, Oct.92 <dbl>, Nov.92 <dbl>, Dec.92 <dbl>
数据
df <- structure(list(Year = c(1990L, 1990L, 1990L, 1990L, 1990L, 1991L,
1991L, 1992L, 1992L, 1992L), Categories = c("A", "B", "C", "D",
"E", "A", "B", "A", "B", "C"), January = c(4564, 6487, 42862,
15, 19164, 21436.8, 23709.6, 32800.8, 35073.6, 37346.4), February = c(465465,
421214, 512, -169222.7, -401699.2, -634175.7, -866652.2, -1796558.2,
-2029034.7, -2261511.2), March = c(12L, 878L, 484L, 90L, -304L,
-698L, -1092L, -2668L, -3062L, -3456L), April = c(468L, 2112L,
48L, 456L, 246L, 36L, -174L, -1014L, -1224L, -1434L), May = c(4884,
421283, 515, 137858.3, 135673.8, 133489.3, 131304.8, 122566.8,
120382.3, 118197.8), June = c(12788, 56456, 212, 48, 778, 57,
3, -27597.89, -32976, -38354.11), July = c(4218, 54654, 515,
465, 12, 478, -8210.6, -29087.86, -34307.17, -39526.49), August = c(-58445.86,
515, 137858.33, 135673.83, 133489.33, 131304.83, 129120.33, 292051,
321333.47, 350615.94), September = c(-90643, 212, 48, 778, 57,
3, 30425.33, 82253.33, 95210.33, 108167.33), October = c(-122840.1,
515, 137858.3, 135673.8, 133489.3, 131304.8, 129120.3, 331147.5,
367329.4, 403511.2), November = c(-155037.29, 212, 48, 778, 57,
3, -11463.57, -12728.17, -14420.56, -16112.96), December = c(-187234.4286,
515, 465, 12, 478, 331.3333, 337.8333, 363.8333, 370.3333, 376.8333
)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))