如何基于两列合并多个数据框?

时间:2017-07-21 16:17:08

标签: r dataframe merge gps dplyr

我对R很新,需要一些帮助。

我为4天内收集的数据提供了多个数据框。每个数据框看起来都像这样(非常简单):

<composite:interface>
</composite:interface>

<composite:implementation>
    <c:if test="false"> 
        <composite:insertChildren />
    </c:if>
</composite:implementation>

我希望根据匹配的Long和Lat值合并多个数据帧,以平均掉特定位置的所有“PM”值。最终结果将如下所示(2月13日至16日):

Lat           Long       PM
-33.9174    151.2263     8
-33.9175    151.2264     10 
-33.9176    151.2265     9
-33.9177    151.2266     8

据我所知,合并2个数据帧非常简单:

Lat         Long    PM.13th Feb  PM.14th Feb  PM.15th Feb   **Mean**
-33.9174   151.2263     8            9           11         9.33
-33.9175   151.2264     10           11          12          11
-33.9176   151.2265     9            14          13          12
-33.9177   151.2266     8            10          11         9.66

但是如何根据匹配的经度和纬度值合并多个数据帧?

另外,有没有一种方法可以过滤数据,以便匹配彼此相差0.001 Lat / Long的数据? (目前我将Lat / Long数据四舍五入到小数点后3位,但它重复了我的数据。)

谢谢!

2 个答案:

答案 0 :(得分:1)

这里可能是一个答案,虽然它有点冗长,并且对于大量数据帧不会很好:

library(tidyverse)
feb_13 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(8,10,9,8))

feb_14 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(7,3,4,5))

feb_15 <- data_frame(lat = c(-33.9174,-33.9175,-33.9176,-33.9177), 
                 long = c(151.2263, 151.2264,151.2265,151.2266),
                 pm = c(1,4,10,12))

这是第一种技术。很简单,但这里的意思是丑陋......

df <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
        left_join(feb_15, by = c("lat", "long")) %>%
        rename(
         pm_feb13 = pm.x,
         pm_feb14 = pm.y,
         pm_feb15 = pm
        ) %>%
        mutate(
         mean = c((pm_feb13[1] + pm_feb14[1] + pm_feb15[1])/3,
                  (pm_feb13[2] + pm_feb14[2] + pm_feb15[2])/3,
                  (pm_feb13[3] + pm_feb14[3] + pm_feb15[3])/3,
                  (pm_feb13[4] + pm_feb14[4] + pm_feb15[4])/3)
        )

这是第二个选项,它有很多管道,但使用summarize

df_2 <- left_join(feb_13, feb_14, by = c("lat", "long")) %>%
          left_join(feb_15, by = c("lat", "long")) %>%
          group_by(lat, long) %>%
          summarise(
            mean = mean(c(pm.x, pm.y, pm), na.rm=T)
          ) %>%
          full_join(feb_13, by = c("lat", "long")) %>%
          full_join(feb_14, by = c("lat", "long")) %>%
          full_join(feb_15, by = c("lat", "long")) %>%
          rename(
            pm_feb13 = pm.x,
            pm_feb14 = pm.y,
            pm_feb15 = pm
          ) %>%
          arrange(long)

答案 1 :(得分:0)

对于匹配,也许是来自dplyr的inner_join?

var result = [{
    "recordid": 1,
    "recordidclass": "Parent Class",
    "relatedrecid": 2,
    "relatedrecclass": "Child Class2"   
},
{
    "recordid": 1,
    "recordidclass": "Parent Class",
    "relatedrecid": 3,
    "relatedrecclass": "Child Class3"   
},
{
    "recordid": 2,
    "recordidclass": "Parent Class",
    "relatedrecid": 5,
    "relatedrecclass": "Child Class5"   
},
{
    "recordid": 3,
    "recordidclass": "Parent Class",
    "relatedrecid": 7,
    "relatedrecclass": "Child Class7"   
}]