我有以下具有1051个观测值的数据框。
customer_id long lat
11111 111.320 110.574
11112 111.243 110.311
我需要操纵数据框,以便每个观察与每个观察匹配。这将使我能够获得每次观察之间的距离。
customer_id_a long_a lat_b customer_id_b long_a lat_b
11111 111.320 110.574 11112 111.243 110.311
在R中,我该怎么做?
答案 0 :(得分:1)
基于R的解决方案。首先,我创建一些玩具数据:
n <- 50
df <- data.frame(customer_id = sprintf("1%0.5d", 1:50),
long = rnorm(n)+105, lat = rnorm(n)+110)
head(df)
# customer_id long lat
#1 100001 105.7532 109.4935
#2 100002 102.0772 110.9918
#3 100003 102.8655 110.7422
#4 100004 103.3984 111.1385
#5 100005 102.8614 111.8068
#6 100006 105.1860 110.3117
利用这些数据,我们可以获得所有组合,适当地复制df
,然后将两者结合起来:
cs <- combn(nrow(df), 2)
new_df <- cbind(a = df[cs[1,], ], b = df[cs[2,], ])
rownames(new_df) <- NULL # Remove default rownames
head(new_df)
# a.customer_id a.long a.lat b.customer_id b.long b.lat
#1 100001 105.7532 109.4935 100002 102.0772 110.9918
#2 100001 105.7532 109.4935 100003 102.8655 110.7422
#3 100001 105.7532 109.4935 100004 103.3984 111.1385
#4 100001 105.7532 109.4935 100005 102.8614 111.8068
#5 100001 105.7532 109.4935 100006 105.1860 110.3117
#6 100001 105.7532 109.4935 100007 103.8722 111.2530
答案 1 :(得分:0)
我们可以使用dcast
中的data.table
library(data.table)
dcast(setDT(df1)[, newid := 1], newid ~ letters[rowid(newid)],
value.var = c('customer_id', 'long', 'lat'))[, newid := NULL][]
# customer_id_a customer_id_b long_a long_b lat_a lat_b
#1: 11111 11112 111.32 111.243 110.574 110.311
或使用reshape
中的base R
df2 <- transform(df1, newid = 1)
df2$Seq <- with(df2, letters[ave(newid, newid, FUN = seq_along)])
reshape(df2, idvar = 'newid', timevar= 'Seq', direction = 'wide')[-1]
# customer_id.a long.a lat.a customer_id.b long.b lat.b
#1 11111 111.32 110.574 11112 111.243 110.311
df1 <- structure(list(customer_id = 11111:11112, long = c(111.32, 111.243
), lat = c(110.574, 110.311)), class = "data.frame", row.names = c(NA,
-2L))