我正在尝试<id>
数据,以便仅将dcast
值拆分为新列。但是,我设法做到这一点的唯一方法是先Actual
,然后再dcast
。我想知道是否有更有效的解决方案。
第1步:
我已经为数据做了一些准备,但这是这样的:
melt
现在,我想在> test_m <- melt(test, id.vars = c("category", "Budget_year", "State"))
> test_m <- test_m[,c("Year", "Type_of_observation"):= tstrsplit(variable, " ", fixed = TRUE)]
> test_m[,variable := NULL]
> head(test_m, n = 10)
category Budget_year State value Year Type_of_observation
1: Transfer Duty 2000_01 N 1916 1998-99 Actual
2: Land Tax 2000_01 N 948 1998-99 Actual
3: Payroll Tax 2000_01 N 3605 1998-99 Actual
4: Total Gambling 2000_01 N 1419 1998-99 Actual
5: GST 2000_01 N 4705 1998-99 Actual
6: Transfer Duty 2000_01 N 1747 1999-00 Budget
7: Land Tax 2000_01 N 830 1999-00 Budget
8: Payroll Tax 2000_01 N 3616 1999-00 Budget
9: Total Gambling 2000_01 N 1558 1999-00 Budget
10: GST 2000_01 N 5162 1999-00 Budget
列中添加一个新列,但只使用Type_of_observation
观察值,而将所有其他观察类型留在后面。我当前的方法是先Actual
,然后dcast
,如下所示:
第2步:所需的输出
melt
因此,现在我有一个用于> test_c <- dcast(test_m, category + Budget_year + State + Year ~ Type_of_observation)
> test_mc <- melt(test_c, id.vars = c("category", "Budget_year", "State", "Year", "Actual"), measure.vars = c("Budget", "Estimate", "Revised"))
> head(test_mc, n = 10)
category Budget_year State Year Actual variable value
1: GST 2000_01 N 1998-99 4705 Budget NA
2: GST 2000_01 N 1999-00 NA Budget 5162
3: GST 2000_01 N 2000-01 NA Budget 8318
4: GST 2000_01 N 2001-02 NA Budget NA
5: GST 2000_01 N 2002-03 NA Budget NA
6: GST 2000_01 N 2003-04 NA Budget NA
7: Land Tax 2000_01 N 1998-99 948 Budget NA
8: Land Tax 2000_01 N 1999-00 NA Budget 830
9: Land Tax 2000_01 N 2000-01 NA Budget 921
10: Land Tax 2000_01 N 2001-02 NA Budget NA
的列,并且所有其他类型的观察值都保留在Actuals
列中。
有没有一种方法可以使我从variable
转到test_m
,而不必同时进行test_mc
和dcast
?我最好采用melt
解决方案,但对任何事物都开放。
这里是data.table
的{{1}}:
dput
答案 0 :(得分:1)
您可以先完成案例,然后再加入数据集。
最后,您进行更新联接以查找实际值。
#create complete cases
ans <- test_m[CJ(category=category, Budget_year=Budget_year, State=State, Year=Year, Type_of_observation=c("Budget", "Estimate", "Revised"), unique=TRUE),
on=.(category, Budget_year, State, Year, Type_of_observation)][
#update join
test_m[Type_of_observation=="Actual"],
Actual := i.value,
on=.(category, Budget_year, State, Year)]
#order to match test_mc
setorder(ans, category, Budget_year, State, Year, Type_of_observation)[]
答案 1 :(得分:0)
我认为我有一个简单的data.table
方法,可以使用setkey
并加入方括号内。
我将使用一个更简单的data.table
。目标是将interest_rate
放入其自己的列中。
samp <- data.table(
group=c("a","a","a","b","b","b","c","c","c"),
variable=c("balance", "end_balance","interest_rate"),
value=c(1000, 940, .05, 1200, 1040, .08, 980, 970, .10)
)
setkey(samp, group)
# This will create a data.table with just our desired variable value, interest_rate, by group
samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]
# We then join this to the original data.table using the already set key and
# drop the interest_rate rows in the final data.table
samp[samp[variable=="interest_rate", .(interest_rate=unique(value)), by=.(group)]][variable!="interest_rate"]