我有2个相等列大小为5的数据帧。前4列名称相同,最后一列不同。我在最后一列中报告了值(T),表明每个平均值都有一个异常值。各个数据框中的sigma。
我的第一个数据框 - df1
TimeStamp <- c("2015-04-01 11:40:13", "2015-04-03 02:54:45")
ID <- c("DL1X8", "DL202")
Avg <- c(38.1517, 0.7131)
Sig <- c(11.45880000, 0.01257816)
Outlier_Avg <- c("T","T")
df1 <- data.frame(TimeStamp, ID, Avg, Sig,Outlier_Avg)
+---------------------+-------+---------+-------------+-------------+
| TimeStamp | ID | Avg | Sig | Outlier_Avg |
+---------------------+-------+---------+-------------+-------------+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | T |
| 2015-04-03 02:54:45 | DL202 | 0.7131 | 0.01257816 | T |
+---------------------+-------+---------+-------------+-------------+
我的第二个数据框 - df2
TimeStamp <- c("2015-04-01 11:40:13", "2015-04-04 02:57:45", "2015-04-06 09:54:45")
ID <- c("DL1X8", "DP308","DM3X8")
Avg <- c(38.1517, 24.7131, 0.0234)
Sig <- c(11.4588, 6.0175,0.0665)
Outlier_Sig <- c("T","T","T")
df2 <- data.frame(TimeStamp, ID, Avg, Sig,Outlier_Sig)
+---------------------+-------+---------+---------+-------------+
| TimeStamp | ID | Avg | Sig | Outlier_Sig |
+---------------------+-------+---------+---------+-------------+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.4588 | T |
| 2015-04-04 02:57:45 | DP308 | 24.7131 | 6.0175 | T |
| 2015-04-06 09:54:45 | DM3X8 | 0.0234 | 0.0665 | T |
+---------------------+-------+---------+---------+-------------+
所需的输出:
我想找一个看起来像这样的df3
+---------------------+-------+---------+-------------+-------------+-------------+
| TimeStamp | ID | Avg | Sig | Outlier_Avg | Outlier_Sig |
+---------------------+-------+---------+-------------+-------------+-------------+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | T | T |
| 2015-04-03 02:54:45 | DL202 | 0.7131 | 0.01257816 | T | N/A |
| 2015-04-04 02:57:45 | DP308 | 24.7131 | 6.0175 | N/A | T |
| 2015-04-06 09:54:45 | DM3X8 | 0.0234 | 0.0665 | N/A | T |
+---------------------+-------+---------+-------------+-------------+-------------+
我尝试使用merge(df1,df2)。它仅返回匹配的行,因此只返回1行。我需要返回所有行并放入N / A,如上所示。你能帮我这个吗?
答案 0 :(得分:3)
使用all
参数:
merge(df1, df2, all = TRUE)
# TimeStamp ID Avg Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.45880000 T T
# 2 2015-04-03 02:54:45 DL202 0.7131 0.01257816 T <NA>
# 3 2015-04-04 02:57:45 DP308 24.7131 6.01750000 <NA> T
# 4 2015-04-06 09:54:45 DM3X8 0.0234 0.06650000 <NA> T
这是使用all.x = TRUE
和all.y = TRUE
的简写,它们是单独的参数,可让您控制x(df1
在您的情况下)和y(df2
在您的情况下)包含在合并的data.frame中。例如,见:
merge(df1, df2, all.x = TRUE)
# TimeStamp ID Avg Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.45880000 T T
# 2 2015-04-03 02:54:45 DL202 0.7131 0.01257816 T <NA>
merge(df1, df2, all.y = TRUE)
# TimeStamp ID Avg Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.4588 T T
# 2 2015-04-04 02:57:45 DP308 24.7131 6.0175 <NA> T
# 3 2015-04-06 09:54:45 DM3X8 0.0234 0.0665 <NA> T