我有一个表格
的数据表ID REGION INCOME_BAND RESIDENCY_YEARS
1 SW Under 5,000 10-15
2 Wales Over 70,000 1-5
3 Center 15,000-19,999 6-9
4 SE 15,000-19,999 15-19
5 North 15,000-19,999 10-15
6 North 15,000-19,999 6-9
创建
exp = data.table(
ID = c(1,2,3,4,5,6),
REGION=c("SW", "Wales", "Center", "SE", "North", "North"),
INCOME_BAND = c("Under ?5,000", "Over ?70,000", "?15,000-?19,999", "?15,000-?19,999", "?15,000-?19,999","?15,000-?19,999"),
RESIDENCY_YEARS = c("10-15","1-5","6-9","15-19","10-15", "6-9"))
我想将其转换为
我成功完成了dcast的大部分工作:
exp.dcast = dcast(exp,ID~REGION+INCOME_BAND+RESIDENCY_YEARS, fun=length,
value.var=c('REGION', 'INCOME_BAND', 'RESIDENCY_YEARS'))
但是我需要一些帮助来创建合理的列标题。 目前我有
[" ID"
" REGION.1_Center_ 15,000- 19,999_6-9"?
" REGION.1_North_ 15,000- 19,999_10-15"?
" REGION.1_North_ 15,000- 19,999_6-9"?
" REGION.1_SE_ 15,000- 19,999_15-19&#34?; " REGION.1_SW_Under ?5,000_10-15" " REGION.1_Wales_Over?70,000_1-5"
" INCOME_BAND.1_Center_ 15,000- 19,999_6-9"?
" INCOME_BAND.1_North_ 15,000- 19,999_10-15"?
" INCOME_BAND.1_North_ 15,000- 19,999_6-9"?
" INCOME_BAND.1_SE_ 15,000- 19,999_15-19"?
" INCOME_BAND.1_SW_Under?5,000_10-15"
" INCOME_BAND.1_Wales_Over?70,000_1-5"
" RESIDENCY_YEARS.1_Center_ 15,000- 19,999_6-9&#34?; " RESIDENCY_YEARS.1_North_ 15,000- 19,999_10-15&#34?; " RESIDENCY_YEARS.1_North_ 15,000- 19,999_6-9"?
" RESIDENCY_YEARS.1_SE_ 15,000- 19,999_15-19"?
" RESIDENCY_YEARS.1_SW_Under?5,000_10-15"
" RESIDENCY_YEARS.1_Wales_Over?70,000_1-5"
我希望列标题为
ID SW Wales Center SE North Under 5,000 Over 70,000 15,000-19,999 1-5 6-9 10-15 15-19
有人可以提供建议吗?
答案 0 :(得分:0)
这个看似简单的问题并不容易回答。所以,我们将一步一步地前进。
首先,OP尝试同时重塑多个值列,这会产生所有可用组合的不需要的交叉积。
为了以相同的方式处理所有值,我们需要在重新整形之前先melt()
所有值列:
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)]
ID 1-5 10-15 15-19 6-9 ?15,000-?19,999 Center North Over ?70,000 SE SW Under ?5,000 Wales 1: 1 0 1 0 0 0 0 0 0 0 1 1 0 2: 2 1 0 0 0 0 0 0 1 0 0 0 1 3: 3 0 0 0 1 1 1 0 0 0 0 0 0 4: 4 0 0 1 0 1 0 0 0 1 0 0 0 5: 5 0 1 0 0 1 0 1 0 0 0 0 0 6: 6 0 0 0 1 1 0 1 0 0 0 0 0
现在,结果有13列而不是19列,列由相应的值命名。
不幸的是,列按错误顺序出现,因为它们按字母顺序排列。有两种方法可以改变顺序:
setcolorder()
功能重新排列data.table
的列,例如没有复制:
# define column order = order of values
col_order <- c("North", "Wales", "Center", "SW", "SE", "Under ?5,000", "?15,000-?19,999", "Over ?70,000", "1-5", "6-9", "10-15", "15-19")
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)][
# reorder columns
, setcolorder(.SD, c("ID", col_order))]
ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0
现在,首先显示所有REGION
列,然后按指定的顺序显示INCOME_BAND
和RESIDENCY_YEARS
列。
如果将value
转换为具有适当排序因子级别的因子,dcast()
将使用因子级别对列进行排序:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
ID North Wales Center SW SE Under ?5,000 ?15,000-?19,999 Over ?70,000 1-5 6-9 10-15 15-19 1: 1 0 0 0 1 0 1 0 0 0 0 1 0 2: 2 0 1 0 0 0 0 0 1 1 0 0 0 3: 3 0 0 1 0 0 0 1 0 0 1 0 0 4: 4 0 0 0 0 1 0 1 0 0 0 0 1 5: 5 1 0 0 0 0 0 1 0 0 0 1 0 6: 6 1 0 0 0 0 0 1 0 0 1 0 0
如果将列按REGION
,INCOME_BAND
和RESIDENCY_YEARS
分组就足够了,那么我们可以使用快捷方式来避免在col_order
中指定每个值。 fct_inorder()
包中的forcats
函数首次出现在向量中时重新排序因子级别:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
ID SW Wales Center SE North Under ?5,000 Over ?70,000 ?15,000-?19,999 10-15 1-5 6-9 15-19 1: 1 1 0 0 0 0 1 0 0 1 0 0 0 2: 2 0 1 0 0 0 0 1 0 0 1 0 0 3: 3 0 0 1 0 0 0 0 1 0 0 1 0 4: 4 0 0 0 1 0 0 0 1 0 0 0 1 5: 5 0 0 0 0 1 0 0 1 1 0 0 0 6: 6 0 0 0 0 1 0 0 1 0 0 1 0
这是有效的,因为melt()
的输出按variable
排序:
melt(exp, id.vars = "ID")
ID variable value 1: 1 REGION SW 2: 2 REGION Wales 3: 3 REGION Center 4: 4 REGION SE 5: 5 REGION North 6: 6 REGION North 7: 1 INCOME_BAND Under ?5,000 8: 2 INCOME_BAND Over ?70,000 9: 3 INCOME_BAND ?15,000-?19,999 10: 4 INCOME_BAND ?15,000-?19,999 11: 5 INCOME_BAND ?15,000-?19,999 12: 6 INCOME_BAND ?15,000-?19,999 13: 1 RESIDENCY_YEARS 10-15 14: 2 RESIDENCY_YEARS 1-5 15: 3 RESIDENCY_YEARS 6-9 16: 4 RESIDENCY_YEARS 15-19 17: 5 RESIDENCY_YEARS 10-15 18: 6 RESIDENCY_YEARS 6-9