我的数据如下:
library(data.table)
DF <- structure(list(toberevised = c("Number of returns", "Number of joint returns",
"Number with paid preparer's signature"), `SOUTH DAKOTA_All returns` = c(135257620,
52607676, 80455243), `SOUTH DAKOTA_Under_50000` = c(92150166,
20743943, 53622647)), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
我希望将第一列作为变量,并将列中的变量作为变量,所以我这样做了:
DF<- as.data.frame(t(DF))
setnames(DF, DF[1,])
但是我得到了错误:
Passed a vector of type 'list'. Needs to be type 'character'
我已经尝试过一切我想取消的列表,但无济于事。
我在做什么错了?
答案 0 :(得分:0)
转置data.frame很危险,因为t()
返回一个矩阵,其中所有元素(“单元”)都被强制转换为相同的数据类型:
t(DF)
[,1] [,2] [,3] toberevised "Number of returns" "Number of joint returns" "Number with paid preparer's signature" SOUTH DAKOTA_All returns "135257620" " 52607676" " 80455243" SOUTH DAKOTA_Under_50000 "92150166" "20743943" "53622647"
现在,所有数字值都已被强制键入可能不想要的字符。
正如之前{{3}}和here多次提到的,我建议将数据重塑为整齐的格式,即长格式,以简化数据处理:
library(data.table)
long <- melt(DF, id.vars = "toberevised")
long
toberevised variable value 1: Number of returns SOUTH DAKOTA_All returns 135257620 2: Number of joint returns SOUTH DAKOTA_All returns 52607676 3: Number with paid preparer's signature SOUTH DAKOTA_All returns 80455243 4: Number of returns SOUTH DAKOTA_Under_50000 92150166 5: Number of joint returns SOUTH DAKOTA_Under_50000 20743943 6: Number with paid preparer's signature SOUTH DAKOTA_Under_50000 53622647
从长格式开始,我们可以重塑为所需的宽格式:
dcast(long, variable ~ toberevised)
variable Number of joint returns Number of returns Number with paid preparer's signature 1: SOUTH DAKOTA_All returns 52607676 135257620 80455243 2: SOUTH DAKOTA_Under_50000 20743943 92150166 53622647
现在,数字仍然是数字类型。
根据经验,每当将列名视为属性时,例如SOUTH DAKOTA_Under_50000
,数据就可能不是整齐的格式。属性应存储并视为数据项,以便将其用于子集,分组和聚合。
实际上,SOUTH DAKOTA_Under_50000
包含两个属性,一个区域和一个分类。