我有以下df
格式的原始数据,我需要压缩成更少的行data.frame
。数据代表最终的GIS点数据(此处省略了坐标列),因此我希望避免绘制任何重复的点。每一行代表一个点,代理1和代理2中的一个或两个都发布数据。 here将回答一个列数较少的更简单的情况。
对于有些行具有相同的agency3_id
的情况,我想将这两行压缩为一行。举例来说,我希望原始data.frame
中的第3行和第4行(都具有agency3_id abcde
)成为下面所需的data.frame
中的一行(第3行)。我愿意接受任何R方法。我确定还有更好的标题问题-感谢您提供帮助。
library(tidyverse)
library(lubridate)
agency1_id <- as.double(c("1500", NA, "2007", NA, "4501", NA))
agency2_id <- c(NA, "zxc", NA, "xcv", NA, "bnm")
agency3_id <- c("qwert", "ertyu", "abcde", "abcde", NA, NA)
agency1_val <- c(0.21, 1.5, 0.23, NA, 4.3, NA)
agency2_val <- c(0.05, 4.4, NA, 6.3, NA, 2.0)
agency1_desc <- c("st", NA, "ko", NA, "ui", NA)
agency2_desc <- c(NA, "lo", NA, "vf", NA, "kl")
agency1_dtm <- ymd_hm(c("2019-05-30 04:30", NA, "2019-05-30 04:35",
NA, "2019-05-30 04:33", NA))
agency2_dtm <- ymd_hm(c(NA, "2019-05-30 04:20", NA, "2019-05-30
04:29", NA, "2019-05-30 04:31"))
df <- data.frame(agency1_id, agency2_id, agency3_id, agency1_val,
agency2_val, agency1_desc, agency2_desc, agency1_dtm,
agency2_dtm)
as_tibble(df)
# agency1_id agency2_id agency3_id agency1_val agency2_val agency1_desc agency2_desc agency1_dtm agency2_dtm
# <dbl> <fct> <fct> <dbl> <dbl> <fct> <fct> <dttm> <dttm>
# 1 1500 NA qwert 0.21 0.05 st NA 2019-05-30 04:30:00 NA
# 2 NA zxc ertyu 1.5 4.4 NA lo NA 2019-05-30 04:20:00
# 3 2007 NA abcde 0.23 NA ko NA 2019-05-30 04:35:00 NA
# 4 NA xcv abcde NA 6.3 NA vf NA 2019-05-30 04:29:00
# 5 4501 NA NA 4.3 NA ui NA 2019-05-30 04:33:00 NA
# 6 NA bnm NA NA 2 NA kl NA 2019-05-30 04:31:00
所需的df
:
# agency1_id agency2_id agency3_id agency1_val agency2_val agency1_desc agency2_desc agency1_dtm agency2_dtm
# <dbl> <fct> <fct> <dbl> <dbl> <fct> <fct> <dttm> <dttm>
# 1 1500 NA qwert 0.21 0.05 st NA 2019-05-30 04:30:00 NA
# 2 NA zxc ertyu 1.5 4.4 NA lo NA 2019-05-30 04:20:00
# 3 2007 xcv abcde 0.23 6.3 ko vf 2019-05-30 04:35:00 2019-05-30 04:29:00
# 4 4501 NA NA 4.3 NA ui NA 2019-05-30 04:33:00 NA
# 5 NA bnm NA NA 2 NA kl NA 2019-05-30 04:31:00
答案 0 :(得分:1)
您可以利用它。可能不是最简洁的解决方案。
# Data with NA values in column - agency3_id
df_na <- df[is.na(df$agency3_id), ]
# Logic
df <- df[!is.na(df$agency3_id), ] %>%
group_by(agency3_id) %>%
summarise_all(list(~ if(all(is.na(.))) NA else .[!is.na(.)][1]))
# Merge dataframes
rbind(df, df_na)
# Result
# A tibble: 5 x 9
agency3_id agency1_id agency2_id agency1_val agency2_val agency1_desc agency2_desc agency1_dtm
* <fct> <dbl> <fct> <dbl> <dbl> <fct> <fct> <dttm>
1 abcde 2007 xcv 0.23 6.3 ko vf 2019-05-30 04:35:00
2 ertyu NA zxc 1.5 4.4 NA lo NA
3 qwert 1500 NA 0.21 0.05 st NA 2019-05-30 04:30:00
4 NA 4501 NA 4.3 NA ui NA 2019-05-30 04:33:00
5 NA NA bnm NA 2 NA kl NA
# … with 1 more variable: agency2_dtm <dttm>