根据另一列中的值在列中重新编码和传播数据

时间:2018-07-03 10:22:07

标签: r dataframe dplyr

我有一个看起来像这样的表:

Year   Tax1    Tax2    Tax3    Tax4
2004     12     123     145     104
2004    145      99      90      56
2005    212     300     240     123

等...

Tax#列提供有关Year列中值后的年份中所支付的税款的信息。我想重新安排表格,并重命名列,所以看起来像这样:

Year   Tax2004    Tax2005    Tax2006    Tax2007    Tax2008
2004        12        123        145        104         NA
2004       145         99         90         56         NA 
2005        NA        212        300        240        123

我当时正在考虑根据Year列将表拆分为单独的表,然后重命名Tax#列,然后重新合并在一起。但这有点令人费解,我想知道是否有更简单的方法来做到这一点?

非常感谢任何帮助。

3 个答案:

答案 0 :(得分:4)

library(dplyr)
library(tidyr)

df <- read.table(text = "
Year   Tax1    Tax2    Tax3    Tax4
2004     12     123     145     104
2004    145      99      90      56
2005    212     300     240     123
", header = TRUE)


df %>% 
  mutate(id = row_number()) %>% 
  gather(rel_year, amount, contains("Tax")) %>% 
  mutate(rel_year = as.integer(gsub("Tax", "", rel_year)),
         pay_year = Year + rel_year - 1,
         pay_year = paste0("Tax", pay_year)) %>% 
  select(-rel_year) %>% 
  spread(pay_year, amount)

结果:

  Year id Tax2004 Tax2005 Tax2006 Tax2007 Tax2008
1 2004  1      12     123     145     104      NA
2 2004  2     145      99      90      56      NA
3 2005  3      NA     212     300     240     123

答案 1 :(得分:1)

 dat1%>%
   gather(key,value,-Year)%>%
   group_by(key)%>%
   mutate(col=1:n())%>%
   ungroup()%>%
   mutate(key=paste0("Tax",2004:2008)[(Year==2005)+
         as.numeric(sub("\\D+","",key))])%>%
   spread(key,value)

# A tibble: 3 x 7
   Year   col Tax2004 Tax2005 Tax2006 Tax2007 Tax2008
  <int> <int>   <int>   <int>   <int>   <int>   <int>
1  2004     1      12     123     145     104      NA
2  2004     2     145      99      90      56      NA
3  2005     3      NA     212     300     240     123
> 

答案 2 :(得分:1)

这里是使用data.table

的选项
library(data.table)
library(readr)
dcast(melt(setDT(df, keep.rownames = TRUE), id.var = c("rn", "Year"))[,
  newYear := paste0("Tax", Year + parse_number(variable) - 1)], 
     rn + Year~ newYear, value.var = 'value')[, rn := NULL][]
#    Year Tax2004 Tax2005 Tax2006 Tax2007 Tax2008
#1: 2004      12     123     145     104      NA
#2: 2004     145      99      90      56      NA
#3: 2005      NA     212     300     240     123