如何基于R中值来自哪个数据帧创建变量?

时间:2020-03-29 15:59:41

标签: r dataframe statistics data-manipulation

我将按时间从小到大的顺序合并下面的两个df,并且不会重复。 我的目标是还要有两个新变量。

df1
   time   freq                  
1   1.5    1
2   3.5    1
3   4.5    2
4   5.5    1
5   8.5    2
6   9.5    1
7  10.5    1
8  11.5    1
9  15.5    1
10 16.5    1
11 18.5    1
12 23.5    1
13 26.5    1

df2
  time freq
1  0.5    6
2  2.5    2
3  3.5    1
4  6.5    1
5 15.5    1

请帮助我提供创建两个新列的代码:

  1. 如果freq值对应于time中的df1,则新变量(var1)将记录相关的freq值,如果df1不存在0,则返回time

  2. 如果freq值对应于time中的df2,则第二个新变量(var2)将记录该freq df2中的值,如果0中不存在这样的time值,则返回df2

所以我下面会有一个这样的表:

time var1 var2
0.5   0    6
1.5   1    0
2.5   0    2
3.5   1    1
4.5   2    0
5.5   1    0
...

3 个答案:

答案 0 :(得分:1)

如果我了解您的数据框的外观正确(可以通过以下方式创建):

df1 = data.frame(time = c(1.5, 3.5, 4.5, 5.5, 8.5, 9.5, 10.5, 11.5, 15.5, 16.5, 18.5, 23.5, 26.5), freq = c(1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1))
df2 = data.frame(time = c(0.5, 2.5, 3.5, 6.5, 15.5), freq = c(6, 2, 1, 1, 1))

然后您将获得所需的内容:

df_new = data.frame(time = sort(unique(c(df1$time, df2$time))), var1 = sapply(sapply(time, function(x) {df1$freq[df1$time == x]}), function(x) {ifelse(length(x) == 0, 0, x)}), var2 = sapply((sapply(time, function(x) {df2$freq[df2$time == x]})), function(x) {ifelse(length(x) == 0, 0, x)}))

希望这会有所帮助,

答案 1 :(得分:1)

代码-基本R

df3 <- merge(x = df1, df2, by.x = 'time', by.y = 'time', all = TRUE, sort = TRUE)
df3$freq.x[is.na(df3$freq.x)] <- 0
df3$freq.y[is.na(df3$freq.y)] <- 0

代码-数据表库

library('data.table')
setDT(df1)  
setkey(df1, time)
df3 <- merge(x = df1, df2, all = TRUE, sort = TRUE)
df3[is.na(freq.x), freq.x := 0 ]
df3[is.na(freq.y), freq.y := 0 ]

输出

df3
#    time freq.x freq.y
# 1:  0.5      0      6
# 2:  1.5      1      0
# 3:  2.5      0      2
# 4:  3.5      1      1
# 5:  4.5      2      0
# 6:  5.5      1      0
# 7:  6.5      0      1
# 8:  8.5      2      0
# 9:  9.5      1      0
# 10: 10.5      1      0
# 11: 11.5      1      0
# 12: 15.5      1      1
# 13: 16.5      1      0
# 14: 18.5      1      0
# 15: 23.5      1      0
# 16: 26.5      1      0

数据

df1 <- read.table(text = 
'time   freq                  
1   1.5    1
2   3.5    1
3   4.5    2
4   5.5    1
5   8.5    2
6   9.5    1
7  10.5    1
8  11.5    1
9  15.5    1
10 16.5    1
11 18.5    1
12 23.5    1
13 26.5    1', header = TRUE, stringsAsFactor = FALSE)

df2 <- read.table(text = 
'time freq
1  0.5    6
2  2.5    2
3  3.5    1
4  6.5    1
5 15.5    1', header = TRUE, stringsAsFactor = FALSE)

答案 2 :(得分:0)

使用tidyversedplyr的更直接的方法:

library(tidyverse)

df1 <- tibble(time = c(1.5, 3.5, 4.5, 5.5), freq = c(1, 1, 2, 1))
df2 <- tibble(time = c(0.5, 2.5, 3.5, 6.5), freq = c(6, 2, 1, 1))

full_join(df1, df2, by = "time", suffix = c("_1", "_2")) %>% 
  mutate_all(~ .x %>% replace_na(0)) %>% 
  arrange(time)

# A tibble: 7 x 3
   time freq_1 freq_2
  <dbl>  <dbl>  <dbl>
1   0.5      0      6
2   1.5      1      0
3   2.5      0      2
4   3.5      1      1
5   4.5      2      0
6   5.5      1      0
7   6.5      0      1