我有这些数据:
structure(list(type = c("journal", "all", "similar_age_1m", "similar_age_3m",
"similar_age_journal_1m", "similar_age_journal_3m"), count = c("13972",
"754555", "22408", "56213", "508", "1035"), rank = c("13759",
"754043", "22339", "56074", "459", "947"), pct = c("98.48", "99.93",
"99.69", "99.75", "90.35", "91.50")), .Names = c("type", "count",
"rank", "pct"), row.names = c(NA, -6L), class = "data.frame")
我想把它变成一行,列2:4
的名称以相应的类型为前缀。例如journal.count
,journal.rank
...最快的方法是什么?由于某种原因dcast
和reshape
没有为我做这些,我的解决方案有点过于繁琐。
答案 0 :(得分:5)
你提到reshape2
,所以这是一种方法:
library("reshape2")
dcast(melt(dat, id.var="type"), 1~variable+type)
这给出了:
1 count_all count_journal count_similar_age_1m count_similar_age_3m
1 1 754555 13972 22408 56213
count_similar_age_journal_1m count_similar_age_journal_3m rank_all
1 508 1035 754043
rank_journal rank_similar_age_1m rank_similar_age_3m
1 13759 22339 56074
rank_similar_age_journal_1m rank_similar_age_journal_3m pct_all pct_journal
1 459 947 99.93 98.48
pct_similar_age_1m pct_similar_age_3m pct_similar_age_journal_1m
1 99.69 99.75 90.35
pct_similar_age_journal_3m
1 91.50
type
和变量用_
分隔,而不是.
。
答案 1 :(得分:2)
这是另一种方式:
y <- as.numeric(as.matrix(x[-1])) # flatten the data.frame
names(y) <- as.vector(outer(x[['type']], names(x)[-1], paste, sep='.'))
答案 2 :(得分:2)
假设您可以为重塑添加虚拟“时间”变量,您也可以使用基数R轻松完成此操作。假设您的data.frame
被调用:
mydf$id <- 1
(mydfw <- reshape(mydf, direction = "wide", idvar="id", timevar="type"))
# id count.journal rank.journal pct.journal count.all rank.all pct.all
# 1 1 13972 13759 98.48 754555 754043 99.93
# count.similar_age_1m rank.similar_age_1m pct.similar_age_1m
# 1 22408 22339 99.69
# count.similar_age_3m rank.similar_age_3m pct.similar_age_3m
# 1 56213 56074 99.75
# count.similar_age_journal_1m rank.similar_age_journal_1m
# 1 508 459
# pct.similar_age_journal_1m count.similar_age_journal_3m
# 1 90.35 1035
# rank.similar_age_journal_3m pct.similar_age_journal_3m
# 1 947 91.50
如果要重新排序列,清理也不算太糟糕。
mydfw <- mydfw[, unlist(sapply(names(mydf), grep, names(mydfw)))]
答案 3 :(得分:1)
以下是使用expand.grid
获取姓名的解决方案。
要获取数据,首先是子集以删除包含名称的第一列。然后,转置并转换为数字。
> eg <- expand.grid(colnames(x[, -1]), x[, 1])
> setNames(as.numeric(t(x[, -1])), paste(eg[[2]], eg[[1]], sep="."))
journal.count journal.rank
13972.00 13759.00
journal.pct all.count
98.48 754555.00
all.rank all.pct
754043.00 99.93
similar_age_1m.count similar_age_1m.rank
22408.00 22339.00
similar_age_1m.pct similar_age_3m.count
99.69 56213.00
similar_age_3m.rank similar_age_3m.pct
56074.00 99.75
similar_age_journal_1m.count similar_age_journal_1m.rank
508.00 459.00
similar_age_journal_1m.pct similar_age_journal_3m.count
90.35 1035.00
similar_age_journal_3m.rank similar_age_journal_3m.pct
947.00 91.50
答案 4 :(得分:1)
#assuming your data is called "test"
result <- as.data.frame(matrix(t(test[-1]),nrow=1),stringsAsFactors=FALSE)
names(result) <- as.vector(t(outer(unique(test$type),names(test[-1]),paste,sep=".")))
str(result)
'data.frame': 1 obs. of 18 variables:
$ journal.count : chr "13972"
$ journal.rank : chr "13759"
$ journal.pct : chr "98.48"
$ all.count : chr "754555"
$ all.rank : chr "754043"
$ all.pct : chr "99.93"
$ similar_age_1m.count : chr "22408"
$ similar_age_1m.rank : chr "22339"
$ similar_age_1m.pct : chr "99.69"
$ similar_age_3m.count : chr "56213"
$ similar_age_3m.rank : chr "56074"
$ similar_age_3m.pct : chr "99.75"
$ similar_age_journal_1m.count: chr "508"
$ similar_age_journal_1m.rank : chr "459"
$ similar_age_journal_1m.pct : chr "90.35"
$ similar_age_journal_3m.count: chr "1035"
$ similar_age_journal_3m.rank : chr "947"
$ similar_age_journal_3m.pct : chr "91.50"
答案 5 :(得分:0)
假设您的数据框称为dat,这是一个解决方案。这有点粗糙,可能不是你想要的:
dat2 <- data.frame(matrix(unlist(lapply(1:nrow(dat), function(i) dat[i, -1])), nrow=1))
colnames(dat2) <- paste0(rep(dat[, 1], each=ncol(dat)-1), ".", 1:(ncol(dat)-1))
dat2
如果它不一定是数据框,这也可以起作用:
dat3 <- as.numeric(unlist(lapply(1:nrow(dat), function(i) dat[i, -1])))
names(dat3) <- paste0(rep(dat[, 1], each=ncol(dat)-1), ".", 1:(ncol(dat)-1))
dat3