我有一个像这样的数据框:
Name Position Value
a 1 0.2
a 3 0.4
a 4 0.3
b 1 0.5
b 2 0.4
b 5 0.3
c 2 0.3
c 3 0.4
c 5 0.1
d 1 0.2
d 2 0.4
d 3 0.5
我想这样做,以便每个名字的位置总是从1到5,并将NAs填入Value中,如下所示:
Name Position Value
a 1 0.2
a 2 NA
a 3 0.4
a 4 0.3
a 5 NA
b 1 0.5
b 2 0.4
b 3 NA
b 4 NA
b 5 0.3
c 1 NA
c 2 0.3
c 3 0.4
c 4 NA
c 5 0.1
d 1 0.2
d 2 0.4
d 3 0.5
d 4 NA
d 5 NA
有没有办法在没有创建前两列的虚拟数据框的情况下执行此操作,然后使用合并进行某种外连接?
感谢。
答案 0 :(得分:5)
我会使用data.table
,但以不同的方式@akrun强调:
library(data.table)
dt = as.data.table(df)
setkey(dt, Name, Position)
dt[CJ(unique(Name),unique(Position))]
答案 1 :(得分:2)
您可以使用reshape2
包:
# make sample data frame
df <- read.table(text = "Name Position Value
a 1 0.2
a 3 0.4
a 4 0.3
b 1 0.5
b 2 0.4
b 5 0.3
c 2 0.3
c 3 0.4
c 5 0.1
d 1 0.2
d 2 0.4
d 3 0.5", header = TRUE, stringsAsFactors = FALSE)
library('reshape2')
df2 <- dcast(df, Name ~ Position)
df3 <- melt(df2, value.name = "Value", variable.name = "Position")
df3[order(df3$Name), ]
# Name Position Value
# 1 a 1 0.2
# 5 a 2 NA
# 9 a 3 0.4
# 13 a 4 0.3
# 17 a 5 NA
# 2 b 1 0.5
# 6 b 2 0.4
# 10 b 3 NA
# 14 b 4 NA
# 18 b 5 0.3
# 3 c 1 NA
# 7 c 2 0.3
# 11 c 3 0.4
# 15 c 4 NA
# 19 c 5 0.1
# 4 d 1 0.2
# 8 d 2 0.4
# 12 d 3 0.5
# 16 d 4 NA
# 20 d 5 NA
答案 2 :(得分:2)
您可以使用data.table
library(data.table)
DT <- data.table(df)
setkey(DT, Position)
DT[, .SD[J(1:5), roll=FALSE], by=Name][order(Name, Position),]
# Name Position Value
#1: a 1 0.2
#2: a 2 NA
#3: a 3 0.4
#4: a 4 0.3
#5: a 5 NA
#6: b 1 0.5
#7: b 2 0.4
#8: b 3 NA
#9: b 4 NA
#10: b 5 0.3
#11: c 1 NA
#12: c 2 0.3
#13: c 3 0.4
#14: c 4 NA
#15: c 5 0.1
#16: d 1 0.2
#17: d 2 0.4
#18: d 3 0.5
#19: d 4 NA
#20: d 5 NA
或者您可以使用tidyr/dplyr
library(dplyr)
library(tidyr)
df %>%
spread(Position, Value) %>%
gather(Position, Value, `1`:`5`) %>%
arrange(Name, Position)
df <- structure(list(Name = c("a", "a", "a", "b", "b", "b", "c", "c",
"c", "d", "d", "d"), Position = c(1L, 3L, 4L, 1L, 2L, 5L, 2L,
3L, 5L, 1L, 2L, 3L), Value = c(0.2, 0.4, 0.3, 0.5, 0.4, 0.3,
0.3, 0.4, 0.1, 0.2, 0.4, 0.5)), .Names = c("Name", "Position",
"Value"), class = "data.frame", row.names = c(NA, -12L))
答案 3 :(得分:2)
也许它有点矫枉过正,但我认为您可以使用sqldf
来执行此操作:
library(sqldf)
# Your data frame:
df <- data.frame(
name = c('a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'),
position = c(1, 3, 4, 1, 2, 5, 2, 3, 5, 1, 2, 3),
value = c(0.2, 0.4, 0.3, 0.5, 0.4, 0.3, 0.3, 0.4, 0.1, 0.2, 0.4, 0.5)
)
# A data frame to hold the positions you want to fill:
pos = data.frame(pos = 1:5)
# SQLdf let's you write SQL sentences that use data frames like SQL tables:
df2 <- sqldf(
"select a.*, b.value as value
from (
select a.name, p.pos as position
from (select distinct name from df) as a, pos as p
) as a
left join df as b on a.name = b.name and a.position = b.position"
)
df2
## Result:
## name position value
##1 a 1 0.2
##2 a 2 NA
##3 a 3 0.4
##4 a 4 0.3
##5 a 5 NA
##6 b 1 0.5
##7 b 2 0.4
##8 b 3 NA
##9 b 4 NA
##10 b 5 0.3
##11 c 1 NA
##12 c 2 0.3
##13 c 3 0.4
##14 c 4 NA
##15 c 5 0.1
##16 d 1 0.2
##17 d 2 0.4
##18 d 3 0.5
##19 d 4 NA
##20 d 5 NA
当然,您可以将sqldf()
的结果直接分配到df
以覆盖原始数据框
答案 4 :(得分:1)
以下是几个基本解决方案:
as.data.frame.table(tapply(df[[3]], df[2:1], c))
和
merge(df,
expand.grid(Position = unique(df$Position), Name = unique(df$Name)),
all = TRUE)