我有一个数据框,其中包含一组列,这些列是给定年份的指标变量。例如,年份为1980的行的“ d80”列为1,否则为0。
for(i in names(df)[31:35]){
print(c(i, df[[i]][0:5]))
}
[1] "d80" "1" "0" "0" "0" "0"
[1] "d81" "0" "1" "0" "0" "0"
[1] "d82" "0" "0" "1" "0" "0"
[1] "d83" "0" "0" "0" "1" "0"
[1] "d84" "0" "0" "0" "0" "1"
提出了另一种方式:
head(data$d80)
[1] 1 0 0 0 0 0
head(data$d81)
[1] 0 1 0 0 0 0
第三种方式:
> x = df[1:3, 31:55]
> dput(x)
structure(list(d80 = c(1L, 0L, 0L), d81 = c(0L, 1L, 0L), d82 = c(0L,
0L, 1L), d83 = c(0L, 0L, 0L), d84 = c(0L, 0L, 0L), d85 = c(0L,
0L, 0L), d86 = c(0L, 0L, 0L), d87 = c(0L, 0L, 0L), d88 = c(0L,
0L, 0L), d89 = c(0L, 0L, 0L), d90 = c(0L, 0L, 0L), d91 = c(0L,
0L, 0L), d92 = c(0L, 0L, 0L), d93 = c(0L, 0L, 0L), d94 = c(0L,
0L, 0L), d95 = c(0L, 0L, 0L), d96 = c(0L, 0L, 0L), d97 = c(0L,
0L, 0L), d98 = c(0L, 0L, 0L), d99 = c(0L, 0L, 0L), d00 = c(0L,
0L, 0L), d01 = c(0L, 0L, 0L), d02 = c(0L, 0L, 0L), d03 = c(0L,
0L, 0L), d04 = c(0L, 0L, 0L)), row.names = c("1", "2", "3"), class = "data.frame")
我的最终目标是计算每年给定列的平均值,因此我想添加一列,其中每行的值等于该行的年份。换句话说,我想将一组年指标列折叠为一个列。例如,上面的数据将变为
80
81
82
83
84
执行此操作的最佳方法是什么?谢谢您的帮助!
答案 0 :(得分:0)
假设数据集为df
,则可以使用以下方法:
library(tidyverse)
df %>%
group_by(id = row_number()) %>% # for every row numer (row id)
nest() %>% # nest data
mutate(year = map(data, ~as.numeric(gsub("d", "", names(.)[.==1])))) %>% # keep the column name of value 1, remove "d" and make the value numeric
unnest() %>% # unnest data
select(-id) # remove row id
# # A tibble: 3 x 26
# year d80 d81 d82 d83 d84 d85 d86 d87 d88 d89 d90 d91 d92 d93
# <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 80 1 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2 81 0 1 0 0 0 0 0 0 0 0 0 0 0 0
# 3 82 0 0 1 0 0 0 0 0 0 0 0 0 0 0
# # ... with 11 more variables: d94 <int>, d95 <int>, d96 <int>, d97 <int>, d98 <int>, d99 <int>,
# # d00 <int>, d01 <int>, d02 <int>, d03 <int>, d04 <int>
新列称为year
,它位于数据集的开头。
另一种方法是进行一些重塑和合并:
library(tidyverse)
# add a row id (useful for reshaping after)
df = df %>% mutate(id = row_number())
df %>%
gather(year, value, -id) %>% # reshape data
filter(value == 1) %>% # keep 1s
mutate(year = as.numeric(gsub("d", "", year))) %>% # update year value
left_join(df, by="id") %>% # join back original dataset
select(-id, -value) # remove unnecessary columns
# year d80 d81 d82 d83 d84 d85 d86 d87 d88 d89 d90 d91 d92 d93 d94 d95 d96 d97 d98 d99 d00 d01
# 1 80 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2 81 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 3 82 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# d02 d03 d04
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
一个基本的R解决方案应该是
df$year = as.numeric(gsub("d", "", apply(df , 1, function(x) names(x)[x==1])))