使用R合并具有不同长度的两行两个数据集

时间:2014-03-02 21:48:37

标签: r merge

通过合并两个长度不同的数据帧,我遇到了问题 为了使数据集尽可能简单:

数据集A - 人员 http://pastebin.com/HbaeqACi
数据集B - Waterfeatures: http://pastebin.com/UdDvNtHs
数据集C - 城市: http://pastebin.com/nATnkMRk

我有一些R代码,这与我的问题无关,但我会完全粘贴它,所以你有完全相同的情况:

require(fossil)
library(fossil)
#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)

#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

#Generate a function that looks for the closest waterfeature for each id coordinates and calculate/save the distance
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
disnw <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
disnw <- min(disnw)
disnw <- data.frame(disnw, WaterFeature=tmp)
return(disnw)
}

# apply distance calculation function to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))

现在我想将计算出的距离复制到城市数据集。我尝试使用合并(两个数据集都有城市属性),人员只包含城市数据集中的城市。

city_all_parameters = city
city_all_parameters = merge(city_all_parameters, persons[,c("city", "disnw")], all=TRUE)

不幸的是,这不是我想要的结果。我有164行,但我只想要这5行+变量disnw并且它是相应的值。
我也试过了rbind,但是我得到了错误:
“rbind中的错误(deparse.level,...):参数列的数量不匹配”

任何提示,如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

您的代码按预期工作,但我想在base中向您展示更优雅的方式。我评论了代码:

library(fossil)
# If you want to use pastebin, you can make it easy to load in for us like this:
# But I recommend using dput(persons) and pasting the results in.
persons = read.csv("http://pastebin.com/raw.php?i=HbaeqACi", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("http://pastebin.com/raw.php?i=UdDvNtHs", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("http://pastebin.com/raw.php?i=nATnkMRk", header =TRUE)

# Use column names instead of column indices to clarify your code
UniqueCoordinates <- data.frame(unique(persons[,c('POINT_X','POINT_Y')]))
# I didn't understand why you wanted to format the Id,
# but you don't need the Id in this code
# UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)

# Instead of calculating the pairwise distance between all 
# the water points everytime, use deg.dist with mapply:
UniqueCoordinates$disnw <- mapply(function(x,y) min(deg.dist(long1=x,lat1=y,
                                                             long2=water$POINT_X,
                                                             lat2=water$POINT_Y)),
                                  UniqueCoordinates$POINT_X,
                                  UniqueCoordinates$POINT_Y)

persons <- merge(UniqueCoordinates,persons)
# I think this is what you wanted...
unique(persons[c('city','disnw')])

#       city     disnw
# 1   City E 6.4865635
# 20  City A 1.6604204
# 69  City B 0.9893909
# 113 City D 0.6001968
# 148 City C 0.5308953

# If you want to merge to the city
merge(persons,city,by='city')