编辑: 可以在此处找到原始数据集:link
我有一个矩阵,如:
data <- matrix(c("a","1","10",
"b","1","20",
"c","1","30",
"a","2","10",
"b","2","20",
"a","3","10",
"c","3","20"),
ncol=3, byrow=TRUE)
我想重塑一个数据帧,将缺失的值强制为零:
data <- matrix(c("a","1","10",
"b","1","20",
"c","1","30",
"a","2","10",
"b","2","20",
"c","2","0",
"a","3","10",
"b","3","0",
"c","3","20"),
ncol=3, byrow=TRUE)
如何使用重塑包进行操作? Thaks
答案 0 :(得分:1)
我们可以在转换您的数据之后使用tidyr的complete
:
library(tidyr)
data <- as.data.frame(data)
data$V3 <- as.numeric(as.character(data$V3))
complete(data, V1, V2, fill = list(V3 = 0))
答案 1 :(得分:1)
tidyr
更好,但如果你想使用reshape
,你可以
library(reshape2)
data2=dcast(data = as.data.frame(data),V1~V2)
data3=melt( data2,measure.vars=colnames(data2)[-1])
data3[is.na(data3)]="0"
答案 2 :(得分:1)
对我而言,就像你正在处理类似多变量时间序列的事情。因此我建议使用适当的时间序列对象。
library(zoo)
res=read.zoo(data.frame(data,stringsAsFactors=FALSE),
split=1,
index.column=2,
FUN=as.numeric)
coredata(res)=as.numeric(coredata(res))
coredata(res)[is.na(res)]=0
这给出了
res
# a b c
#1 10 20 30
#2 10 20 0
#3 10 0 20
答案 3 :(得分:1)
我认为你通过一个包含多个类的矩阵来做错了。
首先,我会转换为data.frame
或data.table
,然后将所有列转换为正确的类型。像
library(data.table) # V 1.9.6+
# Convert to data.table
DT <- as.data.table(data)
# Convert to correct column types
for(j in names(DT)) set(DT, j = j, value = type.convert(DT[[j]]))
然后我们可以使用data.table::CJ
展开行,并将零分配给NA
值
## Cross join all column except the third
DT <- DT[do.call(CJ, c(unique = TRUE, DT[, -3, with = FALSE])), on = names(DT)[-3]]
## Or if you want only to operate on these two columns you can alternatively do
# DT <- DT[CJ(V1, V2, unique = TRUE), on = c("V1", "V2")]
## Fill with zeroes
DT[is.na(V3), V3 := 0]
DT
# V1 V2 V3
# 1: a 1 10
# 2: a 2 10
# 3: a 3 10
# 4: b 1 20
# 5: b 2 20
# 6: b 3 0
# 7: c 1 30
# 8: c 2 0
# 9: c 3 20