我有一个看起来像这样的data.frame
timestamp value.x station value.y parameter.x value parameter.y
1 1/1/2010 0.6 abc 188,000 AREA PLANTED 22 PROGRESS
2 1/1/2010 0.6 abc 156.3 YIELD NA NA
3 1/1/2010 -10 def 188,000 AREA PLANTED 22 PROGRESS
4 1/1/2010 -10 def 156.3 YIELD NA NA
我想使用reshape
使其看起来像这样:
timestamp value.x station AREA PLANTED YIELD PROGRESS
1 1/1/2010 0.6 abc 188,000 156.3 22
3 1/1/2010 -10 def 188,000 156.3 22
我试过
reshape(data = b, varying = list(c('value.y', 'parameter.x', 'value', 'parameter.y')),
v.names = c('AREA PLANTED', 'YIELD', 'PROGRESS'),
timevar = row.names(b),
times = b$timestamp, direction = 'wide', idvar = b$station)
但它说
Error in [.data.frame(data, , idvar) : undefined columns selected
我尝试过更改一下,但无论我做什么,它都会不断抛出这个错误。
答案 0 :(得分:2)
这使用reshape2
。我认为不可能在一个步骤中投射数据帧。请注意,输入似乎是某些其他连接操作的结果(因为某些名称具有.x和。后缀)。我想可以改进连接以避免这种复杂化
df <- read.table(header=TRUE, stringsAsFactors = FALSE, text =
"timestamp value.x station value.y parameter.x value parameter.y
1/1/2010 0.6 abc 188,000 AREAPLANTED 22 PROGRESS
1/1/2010 0.6 abc 156.3 YIELD NA NA
1/1/2010 -10 def 188,000 AREAPLANTED 22 PROGRESS
1/1/2010 -10 def 156.3 YIELD NA NA
")
library(reshape2)
# extract the last two columns into a variable/value and make unique
df1 <- unique(df[!is.na(df$value),c("timestamp", "value.x", "station", "parameter.y", "value")])
names(df1) <- c("timestamp", "value.x", "station", "variable", "value")
# extract columns 4,5 into a variable value
df2 <- df[,c("timestamp", "value.x", "station", "parameter.x", "value.y")]
names(df2) <- c("timestamp", "value.x", "station", "variable", "value")
# cast
dcast(rbind(df1, df2), timestamp + value.x + station ~ variable, value.var = "value")
# timestamp value.x station AREAPLANTED PROGRESS YIELD
# 1 1/1/2010 -10.0 def 188,000 22 156.3
# 2 1/1/2010 0.6 abc 188,000 22 156.3
答案 1 :(得分:2)
仍在基数R中,根据需要考虑两个merge
数据框之间的reshape
。您当前的设置使用的参数用于从长到长的重塑,而不是根据需要反之亦然。
mdf <- merge(
reshape(b, timevar="parameter.x",
v.names = c("value.y"),
idvar = c("timestamp", "value.x", "station"),
direction = "wide",
drop = c("value", "parameter.y")),
reshape(b[!is.na(b$value),], timevar="parameter.y",
v.names = c("value"),
idvar = c("timestamp", "value.x", "station"),
direction = "wide",
drop = c("value.y", "parameter.x")),
by=c("timestamp", "value.x", "station")
)
names(mdf) <- gsub("(value\\.y\\.|value\\.)", "", names(mdf))
mdf
# timestamp x station AREA PLANTED YIELD PROGRESS
# 1 1/1/2010 -10.0 def 188,000 156.3 22
# 2 1/1/2010 0.6 abc 188,000 156.3 22
答案 2 :(得分:0)
我同意@ epi99,任务需要分解为步骤并重新组合。这是一种tidyverse
方式,假设您的数据框被称为b
,如示例代码所示:
library(tidyverse)
b = read.csv("C:\\Temp\\stack_overflow_sample_data_which_I_hacked_together_in_Excel.csv")
df1 = b %>% select(timestamp, value.x, station, value.y, parameter.x) %>% spread(key = parameter.x, value = value.y)
df2 = b %>% select(timestamp, value.x, station, value, parameter.y) %>% filter(!is.na(value)) %>% spread(key = parameter.y, value = value)
df.answer = merge(df1, df2, by = c("timestamp", "value.x", "station"))