我有一个数据帧df
,看起来像下面的数据(实际上,我有很多列,一直到2020_Actual和2020_Budget):
Company Code 2013_Actual 2013_Budget 2014_Actual 2014_Budget
CompanyX 100 - Salary & Wages 1500 1601 1620 1680
CompanyX 102 - Bonus & Incentives 3000 3500 3150 3300
CompanyX 104 - Overtime 60 70 78 82
CompanyX 110 - Temporary Help 35 55 48 56
CompanyX 112 - Taxes & Benefits 800 840 880 900
我想将Code
之后的所有列折叠成Actual
,Budget
和Year
列,如下所示:
Company Code Actual Budget Year
CompanyX 100 - Salary & Wages 1500 1601 2013
CompanyX 102 - Bonus & Incentives 3000 3500 2013
CompanyX 104 - Overtime 60 70 2013
CompanyX 110 - Temporary Help 35 55 2013
CompanyX 112 - Taxes & Benefits 800 840 2013
CompanyX 100 - Salary & Wages 1620 1680 2014
CompanyX 102 - Bonus & Incentives 3150 3300 2014
CompanyX 104 - Overtime 78 82 2014
CompanyX 110 - Temporary Help 48 56 2014
CompanyX 112 - Taxes & Benefits 880 900 2014
到目前为止,这一直是我的方法,但是并没有得到正确的结果。不必指定要按名称收集的每个列,这也很好,因为输入表的列名可能会更改:
library(dplyr)
library(tidyr)
# Read in data
df <- read.csv("inputTable.csv", header=T, stringsAsFactors=F)
# Collapse columns
actuals<- gather(df,
key = "Year",
value = "Actuals",
2013_Actual,2014_Actual,2015_Actual,2016_Actual,2017_Actual,2018_Actual,2019_Actual,2020_Actual)
budget <- gather(df,
key = "Year",
value = "Budget",
2013_Budget,2014_Budget,2015_Budget,2016_Budget,2017_Budget,2018_Budget,2019_Budget,2020_Budget)
# grab only relevant columns
df_actuals_final <- select(df_actuals, Company, Code, Year, Actuals)
df_budget_final <- select(df_budget, Company, Code, Year, Budget)
# merge
m <- merge(df_actuals_final, df_budget_final, by=c("Company","Code", "Year"))
write.csv(m, "inputTable.mod.csv", quote=F, row.names=F)
答案 0 :(得分:0)
gather
已淘汰,并被pivot_longer
取代,这使得这种转换现在非常简单。
tidyr::pivot_longer(df, cols = -c(Company, Code),
names_to = c('Year', '.value'),
names_sep = '_')
# A tibble: 10 x 5
# Company Code Year Actual Budget
# <chr> <chr> <chr> <int> <int>
# 1 CompanyX 100-Salary&Wages 2013 1500 1601
# 2 CompanyX 100-Salary&Wages 2014 1620 1680
# 3 CompanyX 102-Bonus&Incentives 2013 3000 3500
# 4 CompanyX 102-Bonus&Incentives 2014 3150 3300
# 5 CompanyX 104-Overtime 2013 60 70
# 6 CompanyX 104-Overtime 2014 78 82
# 7 CompanyX 110-TemporaryHelp 2013 35 55
# 8 CompanyX 110-TemporaryHelp 2014 48 56
# 9 CompanyX 112-Taxes&Benefits 2013 800 840
#10 CompanyX 112-Taxes&Benefits 2014 880 900
数据
df <- structure(list(Company = c("CompanyX", "CompanyX", "CompanyX",
"CompanyX", "CompanyX"), Code = c("100-Salary&Wages", "102-Bonus&Incentives",
"104-Overtime", "110-TemporaryHelp", "112-Taxes&Benefits"),
`2013_Actual` = c(1500L, 3000L, 60L, 35L, 800L), `2013_Budget` = c(1601L,
3500L, 70L, 55L, 840L), `2014_Actual` = c(1620L, 3150L, 78L, 48L, 880L),
`2014_Budget` = c(1680L, 3300L, 82L, 56L, 900L)),
class = "data.frame", row.names = c(NA, -5L))