我有一个看起来像这样的数据框
Name Visit Arrival Departure
Jack week 1 8:00 NA
Jack week 1 NA 8:30
Sally week 5 9:00 NA
Sally week 5 NA 9:30
Adam week 2 2:00 NA
Adam week 2 NA 3:00
抵达和离开时间最初是行,我转入colums,这就是为什么有空值。我想基于名称和访问合并行,所以到达和离开在同一行,如
Name Visit Arrival Departure
Jack week 1 8:00 8:30
Sally week 5 9:00 9:30
Adam week 2 2:00 3:00
任何解决方案都会受到赞赏,在那里尝试合并时会遇到困难。
答案 0 :(得分:5)
只需aggregate
na.omit
作为聚合函数:
aggregate(dat[c("Arrival","Departure")], dat[c("Name","Visit")], FUN=na.omit)
# or
aggregate(cbind(Arrival,Departure) ~ ., data=dat, FUN=na.omit, na.action=na.pass)
# Name Visit Arrival Departure
#1 Jack week1 8:00 8:30
#2 Adam week2 2:00 3:00
#3 Sally week5 9:00 9:30
相同的逻辑适用于data.table
:
dat[, lapply(.SD,na.omit), by=.(Name,Visit)]
...或dplyr
:
dat %>% group_by(Name,Visit) %>% summarise_all(na.omit)
答案 1 :(得分:1)
这是一种方法,假设访问者将有两行数据:
library(dplyr)
df = readr::read_table("Name Visit Arrival Departure
Jack week 1 8:00 NA
Jack week 1 NA 8:30
Sally week 5 9:00 NA
Sally week 5 NA 9:30
Adam week 2 2:00 NA
Adam week 2 NA 3:00", col_types="cccc")
df %>%
group_by(Name, Visit) %>%
mutate(Arrival = ifelse(is.na(Arrival), lag(Arrival), Arrival),
Departure = ifelse(is.na(Departure), lead(Departure), Departure)) %>%
ungroup() %>%
distinct(Name, Visit, .keep_all=TRUE)
# A tibble: 3 × 4
Name Visit Arrival Departure
<chr> <chr> <chr> <chr>
1 Jack week 1 8:00 8:30
2 Sally week 5 9:00 9:30
3 Adam week 2 2:00 3:00
答案 2 :(得分:0)
我确信这可能有更漂亮的方法,但这对我有用:
library(data.table)
library(reshape2)
test <- data.table(Name = c("Jack", "Jack", "Sally", "Sally", "Adam", "Adam"), Visit = c("week 1", "week 1", "week 5", "week 5", "week 2", "week 2"), Arrival = c("8:00", NA, "9:00", NA, "2:00", NA), Departure = c(NA, "8:30", NA, "9:30", NA, "3:00"))
test_m <- melt(test,id.vars = c("Name", "Visit"))
test_m <- test_m[!is.na(value),]
test_c <- dcast(test_m, Name + Visit ~ variable)
> test_c
Name Visit Arrival Departure
1 Adam week 2 2:00 3:00
2 Jack week 1 8:00 8:30
3 Sally week 5 9:00 9:30
希望有所帮助
答案 3 :(得分:0)
实际上,如果你能够在转移之前回到数据,tidyr :: spread将会做得很漂亮。
Name <- c("Jack", "Jack","Sally", "Sally", "Adam", "Adam")
Visit <- c("week1", "week1", "week5", "week5", "week2", "week2")
Itenary <- rep(c("Arrival", "Departure"), 3)
Time <- c("8:00", "8:30", "9:00", "9:30", "2:00", "2:30")
df <- data.frame(Name, Visit, Itenary, Time)
df
Name Visit Itenary Time
1 Jack week1 Arrival 8:00
2 Jack week1 Departure 8:30
3 Sally week5 Arrival 9:00
4 Sally week5 Departure 9:30
5 Adam week2 Arrival 2:00
6 Adam week2 Departure 2:30
df %>%
spread(key = Itenary, value = Time)
Name Visit Arrival Departure
1 Adam week2 2:00 2:30
2 Jack week1 8:00 8:30
3 Sally week5 9:00 9:30