用条件连接两个数据帧(坐标和年份对应)

时间:2018-04-09 18:47:15

标签: r dataframe concatenation conditional-statements

我在尝试在某些条件下连接两个数据帧时遇到问题。 我看起来不同,但我找不到帮助我的解决方案。

这是我的数据:

Dataframe 1 :

"year"     "var"          "x"                 "y"               "info"
"1992","mean_ndvi","4878686.57157449","5393968.15997648","0.386875003576279"
"1992","mean_ndvi","4896433.83572102","5398120.2484886","0.373374998569489"
"1992","mean_ndvi","4900572.93504345","5370687.20427196","0.394125014543533"
"1992","mean_ndvi","4902934.77310431","5361773.82267221","0.271333336830139"
"1992","mean_ndvi","4763325.11415408","5286260.42907455","0.341958343982697"
"1992","mean_ndvi","4659782.7218849","5251960.76092113","0.407333344221115"
"1992","mean_ndvi","4672416.53746615","5253639.4841048","0.443416655063629"
"1992","mean_ndvi","4688194.71187035","5255824.40292703","0.334916681051254"
"1992","mean_ndvi","4697653.82879809","5257181.46577816","0.367166668176651"

Dataframe 2 :

"year"         "x"             "y"             "species"
 "2014" "4001758.3924046" "3138415.9463486"     "Sus scrofa"
 "2016" "3990684.89200331" "3088575.79671371" "Capreolus capreolus"
 "2014" "4002641.44272945" "3078682.12799716" "Capreolus capreolus"
 "2014" "3946723.09681777" "3153792.59524072" "Capreolus capreolus"
 "2014" "3975356.46700669" "2974349.6604129" "Cervus elaphus"
 "2014" "4001283.9265329" "3137527.57584417" "Capreolus capreolus"
 "2014" "3946723.09681777" "3153792.59524072" "Capreolus capreolus"
 "2014" "3946723.09681777" "3153792.59524072" "Capreolus capreolus"
 "2017" "4000195.01511827" "3103181.07855945" "Capreolus capreolus"

第一个dataframe包含的方式比第二个dataconcatenate。 我想做的是: dataframes两个row并且只保留第二个dataframe中显示的第一个dataframe中的select and filter, merge, cbind, "by hand" with for loops

我尝试了不同的方法:Datafrale 1 : "1992","mean_ndvi","4688194.71187035","5255824.40292703","0.334916681051254" "1992","mean_ndvi","4697653.82879809","5257181.46577816","0.367166668176651" "1992","mean_ndvi","4657938.8843526","5242452.09422199","0.43491667509079" "1992","mean_ndvi","4661111.26475011","5242863.65256642","0.523041665554047" "1992","mean_ndvi","4692800.91855509","5247191.53424558","0.405791670084" Dataframe 2 : "2014" "4001758.3924046" "3138415.9463486" "Sus scrofa" "2016" "3990684.89200331" "3088575.79671371" "Capreolus capreolus" "1992" "4657938.8843526" "5242452.09422199" "Capreolus capreolus" "2017" "4000167.53545378" "3103446.42513062" "Sus scrofa" "1992" "4688194.71187035" "5255824.40292703 "Capreolus capreolus" Result : "1992" "4657938.8843526" "5242452.09422199" "Capreolus capreolus""0.43491667509079" "1992" "4688194.71187035" "5255824.40292703 "Capreolus capreolus" "0.334916681051254" ,但我无法获得任何有效的方法。

我也花了很多时间在网上寻找解决方案,但是,或者我太傻了,看不出我如何使用一个解决方案来解决我的问题,或者没有人遇到同样的问题,我不会这样做。我知道,或者我没有做足够的研究。

如果您对我如何做到这一点有任何线索,我知道它可以非常简单。

First dataframe (with a lot of data)
structure(list(x = c(4878686.57157449, 4896433.83572102, 4900572.93504345, 
4902934.77310431, 4763325.11415408, 4659782.7218849, 4672416.53746615, 
4688194.71187035, 4697653.82879809, 4657938.8843526), y =     c(5393968.15997648, 
5398120.2484886, 5370687.20427196, 5361773.82267221, 5286260.42907455, 
5251960.76092113, 5253639.4841048, 5255824.40292703, 5257181.46577816, 
5242452.09422199), year = c(1993L, 1993L, 1993L, 1993L, 1993L, 
1993L, 1993L, 1993L, 1993L, 1993L), info = c(0.396166652441025, 
0.373374998569489, 0.394125014543533, 0.28979167342186, 0.344375014305115, 
0.414458334445953, 0.416541665792465, 0.342583328485489, 0.378208339214325, 
0.440750002861023)), .Names = c("x", "y", "year", "info"), row.names = c(NA, 
10L), class = "data.frame")

以下是dput(前10行)的结果:

Q = {
   rand(58,1);
   rand(168,1);
   rand(33,1);
   rand(199,1);
   rand(100,1)
};

Q_len = numel(Q);

K = 50;
Z = cell(Q_len,1);

for i = 1:Q_len
    Qi = Q{i};
    Qi_len = numel(Qi);

    k = floor(Qi_len / K) * K

    Z{i} = Qi(1:k);
end

它返回了我对其他数据帧的整个数据框,我不明白为什么,但我不能把结果放在它没有任何意义

1 个答案:

答案 0 :(得分:0)

试试这个:

# Get fancy data
set.seed(666)

df1 <- iris[sample(x = 1:10, size = 6, replace = FALSE),]
df2 <- iris[sample(x = 1:10, size = 6, replace = FALSE),]

# Get common rows
index <- match(apply(df1, 1, paste, collapse = "-"), 
               apply(df2, 1, paste, collapse = "-"))
index <- index[!is.na(index)]

df3 <- df2[index,]

如您所见,df3将是一个只包含公共行的data.frame。