通过合并两个长度不同的数据帧,我遇到了问题 为了使数据集尽可能简单:
数据集A - 人员
http://pastebin.com/HbaeqACi
数据集B - Waterfeatures:
http://pastebin.com/UdDvNtHs
数据集C - 城市:
http://pastebin.com/nATnkMRk
我有一些R代码,这与我的问题无关,但我会完全粘贴它,所以你有完全相同的情况:
require(fossil)
library(fossil)
#load data
persons = read.csv("person.csv", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("water.csv", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("city.csv", header =TRUE)
#### calculate distance
# Generate unique coordinates dataframe
UniqueCoordinates <- data.frame(unique(persons[,4:5]))
UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)
#Generate a function that looks for the closest waterfeature for each id coordinates and calculate/save the distance
NearestW <- function(id){
tmp <- UniqueCoordinates[UniqueCoordinates$Id==id, 1:2]
WaterFeatures <- rbind(tmp,water[,2:3])
disnw <- earth.dist(WaterFeatures, dist=TRUE)[1:(nrow(WaterFeatures)-1)]
disnw <- min(disnw)
disnw <- data.frame(disnw, WaterFeature=tmp)
return(disnw)
}
# apply distance calculation function to each id and the merge
CoordinatesWaterFeature <- ldply(UniqueCoordinates$Id, NearestW)
persons <- merge(persons, CoordinatesWaterFeature, by.x=c(4,5), by.y=c(2,3))
现在我想将计算出的距离复制到城市数据集。我尝试使用合并(两个数据集都有城市属性),人员只包含城市数据集中的城市。
city_all_parameters = city
city_all_parameters = merge(city_all_parameters, persons[,c("city", "disnw")], all=TRUE)
不幸的是,这不是我想要的结果。我有164行,但我只想要这5行+变量disnw
并且它是相应的值。
我也试过了rbind
,但是我得到了错误:
“rbind中的错误(deparse.level,...):参数列的数量不匹配”
任何提示,如何解决这个问题?
答案 0 :(得分:1)
您的代码按预期工作,但我想在base
中向您展示更优雅的方式。我评论了代码:
library(fossil)
# If you want to use pastebin, you can make it easy to load in for us like this:
# But I recommend using dput(persons) and pasting the results in.
persons = read.csv("http://pastebin.com/raw.php?i=HbaeqACi", header = TRUE, stringsAsFactors=FALSE)
water = read.csv("http://pastebin.com/raw.php?i=UdDvNtHs", header =TRUE, stringsAsFactors=FALSE)
city = read.csv("http://pastebin.com/raw.php?i=nATnkMRk", header =TRUE)
# Use column names instead of column indices to clarify your code
UniqueCoordinates <- data.frame(unique(persons[,c('POINT_X','POINT_Y')]))
# I didn't understand why you wanted to format the Id,
# but you don't need the Id in this code
# UniqueCoordinates$Id <- formatC((1:nrow(UniqueCoordinates)), width=3,flag=0)
# Instead of calculating the pairwise distance between all
# the water points everytime, use deg.dist with mapply:
UniqueCoordinates$disnw <- mapply(function(x,y) min(deg.dist(long1=x,lat1=y,
long2=water$POINT_X,
lat2=water$POINT_Y)),
UniqueCoordinates$POINT_X,
UniqueCoordinates$POINT_Y)
persons <- merge(UniqueCoordinates,persons)
# I think this is what you wanted...
unique(persons[c('city','disnw')])
# city disnw
# 1 City E 6.4865635
# 20 City A 1.6604204
# 69 City B 0.9893909
# 113 City D 0.6001968
# 148 City C 0.5308953
# If you want to merge to the city
merge(persons,city,by='city')