我有一个很大的数据框,其中有些列是日期,但采用字符格式,例如:
name <- c("John ", "Jay", "Carla")
X3.12.2010 <- c(20, 10, 9)
X3.19.2010 <- c(19, 8, 44)
X3.26.2010 <- c(10, 100, 999)
X4.3.2010 <- c(8, 1, 23)
X4.10.2010 <- c(8, 10, 238)
X4.17.2010 <- c(28, 17, 27)
X4.24.2010 <- c(11, 12, 45)
g <- data.frame(name, X3.12.2010, X3.19.2010, X3.26.2010, X4.3.2010, X4.10.2010, X4.17.2010, X4.24.2010)
但是,我希望日期列采用“ yyyymm”格式,然后对日期和名称的每个唯一组合取均值。我使用以下代码转换日期列:
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
colnames(g)[2:8] <- ifelse(nchar(sub(" X", "", paste(substrRight(colnames(g)[2:8], 4),str_extract(colnames(g)[2:8], "[^.]+")))) < 6,
sub(" X", 0, paste(substrRight(colnames(g)[2:8], 4),str_extract(colnames(g)[2:8], "[^.]+"))),
sub(" X", "", paste(substrRight(colnames(g)[2:8], 4),str_extract(colnames(g)[2:8], "[^.]+"))))
print(g)
name 201003 201003 201003 201004 201004 201004 201004
1 John 20 19 10 8 8 28 11
2 Jay 10 8 100 1 10 17 12
3 Carla 9 44 999 23 238 27 45
我想要的输出如下:
name X201003 X201004
1 John 16.33 13.75
2 Jay 39.33 10.00
3 Carla 350.66 83.25
有没有办法产生这个?谢谢。
答案 0 :(得分:1)
关于存储数据的评论
不使用相同名称的列是一个好习惯。这没有任何意义,最好在源上(即从您那里获取数据的位置)进行更正。
d = data.frame(name = c("John", "Jay", "Carla","John", "Jay", "Carla","John", "Jay", "Carla"),
month = c(201003, 201003, 201003,201003, 201003, 201003,201004, 201004, 201004),
order = c(1,1,1,2,2,2,1,1,1),
value = c(20,10,9,19,8,44,8,10,238))
# name month order value
# 1 John 201003 1 20
# 2 Jay 201003 1 10
# 3 Carla 201003 1 9
# 4 John 201003 2 19
# 5 Jay 201003 2 8
# 6 Carla 201003 2 44
# 7 John 201004 1 8
# 8 Jay 201004 1 10
# 9 Carla 201004 1 238
发布问题的解决方案
为了重塑形状,我们必须为您的列创建不同的名称,然后在以后的阶段提取时间以对数据进行分组并计算均值:
library(tidyverse)
# set as data frame to get columns with different names
g = data.frame(g)
g %>%
gather(time,value,-name) %>% # reshape data
mutate(time = gsub('X([^.]+)|.', '\\1', time)) %>% # get time from column names (everything between "X" and ".")
group_by(name, time) %>% # for each name and time
summarise(MEAN = mean(value)) %>% # get mean value
ungroup() %>% # forget the grouping
spread(time, MEAN) # reshape again
# # A tibble: 3 x 3
# name `201003` `201004`
# <fct> <dbl> <dbl>
# 1 Carla 351. 83.2
# 2 Jay 39.3 10
# 3 John 16.3 13.8