我将按时间从小到大的顺序合并下面的两个df,并且不会重复。 我的目标是还要有两个新变量。
df1
time freq
1 1.5 1
2 3.5 1
3 4.5 2
4 5.5 1
5 8.5 2
6 9.5 1
7 10.5 1
8 11.5 1
9 15.5 1
10 16.5 1
11 18.5 1
12 23.5 1
13 26.5 1
df2
time freq
1 0.5 6
2 2.5 2
3 3.5 1
4 6.5 1
5 15.5 1
请帮助我提供创建两个新列的代码:
如果freq
值对应于time
中的df1
,则新变量(var1
)将记录相关的freq
值,如果df1不存在0
,则返回time
。
如果freq
值对应于time
中的df2
,则第二个新变量(var2
)将记录该freq
df2
中的值,如果0
中不存在这样的time
值,则返回df2
。
所以我下面会有一个这样的表:
time var1 var2
0.5 0 6
1.5 1 0
2.5 0 2
3.5 1 1
4.5 2 0
5.5 1 0
...
答案 0 :(得分:1)
如果我了解您的数据框的外观正确(可以通过以下方式创建):
df1 = data.frame(time = c(1.5, 3.5, 4.5, 5.5, 8.5, 9.5, 10.5, 11.5, 15.5, 16.5, 18.5, 23.5, 26.5), freq = c(1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1))
df2 = data.frame(time = c(0.5, 2.5, 3.5, 6.5, 15.5), freq = c(6, 2, 1, 1, 1))
然后您将获得所需的内容:
df_new = data.frame(time = sort(unique(c(df1$time, df2$time))), var1 = sapply(sapply(time, function(x) {df1$freq[df1$time == x]}), function(x) {ifelse(length(x) == 0, 0, x)}), var2 = sapply((sapply(time, function(x) {df2$freq[df2$time == x]})), function(x) {ifelse(length(x) == 0, 0, x)}))
希望这会有所帮助,
答案 1 :(得分:1)
代码-基本R
df3 <- merge(x = df1, df2, by.x = 'time', by.y = 'time', all = TRUE, sort = TRUE)
df3$freq.x[is.na(df3$freq.x)] <- 0
df3$freq.y[is.na(df3$freq.y)] <- 0
代码-数据表库
library('data.table')
setDT(df1)
setkey(df1, time)
df3 <- merge(x = df1, df2, all = TRUE, sort = TRUE)
df3[is.na(freq.x), freq.x := 0 ]
df3[is.na(freq.y), freq.y := 0 ]
输出
df3
# time freq.x freq.y
# 1: 0.5 0 6
# 2: 1.5 1 0
# 3: 2.5 0 2
# 4: 3.5 1 1
# 5: 4.5 2 0
# 6: 5.5 1 0
# 7: 6.5 0 1
# 8: 8.5 2 0
# 9: 9.5 1 0
# 10: 10.5 1 0
# 11: 11.5 1 0
# 12: 15.5 1 1
# 13: 16.5 1 0
# 14: 18.5 1 0
# 15: 23.5 1 0
# 16: 26.5 1 0
数据
df1 <- read.table(text =
'time freq
1 1.5 1
2 3.5 1
3 4.5 2
4 5.5 1
5 8.5 2
6 9.5 1
7 10.5 1
8 11.5 1
9 15.5 1
10 16.5 1
11 18.5 1
12 23.5 1
13 26.5 1', header = TRUE, stringsAsFactor = FALSE)
df2 <- read.table(text =
'time freq
1 0.5 6
2 2.5 2
3 3.5 1
4 6.5 1
5 15.5 1', header = TRUE, stringsAsFactor = FALSE)
答案 2 :(得分:0)
使用tidyverse
或dplyr
的更直接的方法:
library(tidyverse)
df1 <- tibble(time = c(1.5, 3.5, 4.5, 5.5), freq = c(1, 1, 2, 1))
df2 <- tibble(time = c(0.5, 2.5, 3.5, 6.5), freq = c(6, 2, 1, 1))
full_join(df1, df2, by = "time", suffix = c("_1", "_2")) %>%
mutate_all(~ .x %>% replace_na(0)) %>%
arrange(time)
# A tibble: 7 x 3
time freq_1 freq_2
<dbl> <dbl> <dbl>
1 0.5 0 6
2 1.5 1 0
3 2.5 0 2
4 3.5 1 1
5 4.5 2 0
6 5.5 1 0
7 6.5 0 1