我正在研究渔民捕获的公海中的鱼类。最近,我已经开始测量具有已知种群的单个鱼之间的距离,并评估股票(> 15个独特种群)是否一起旅行。
我的问题是,当我比较库存时,我将库存名称粘贴在一起,有时粘贴在一起的ID将是stock.1_stock.2或其他时候它将是stock.2_stock.1。我需要它们具有相同的唯一标识符,但我不知道在R中如何最好地解决这个问题。有没有人有任何建议?
我的实际数据框很大(> 100,000行),这可能会影响您回答问题的方式。
以下是一些生成较小示例数据集的代码:
#making generic ids
ids <- rep("stock",times=3)
ids <- paste(ids,1:3, sep=".")
#making simple example
tmp <- expand.grid(ids,ids)
tmp <- tmp[ifelse(tmp$Var1==tmp$Var2,T,F)==F,]
tmp$dist <- c(1,2,1,4,2,4)
#comparing stocks
tmp$both <- paste(tmp$Var1,tmp$Var2, sep="_")
tmp
# Var1 Var2 dist both
# 2 stock.2 stock.1 1 stock.2_stock.1
# 3 stock.3 stock.1 2 stock.3_stock.1
# 4 stock.1 stock.2 1 stock.1_stock.2
# 6 stock.3 stock.2 4 stock.3_stock.2
# 7 stock.1 stock.3 2 stock.1_stock.3
# 8 stock.2 stock.3 4 stock.2_stock.3
答案 0 :(得分:2)
如果您为每一行对这对ID进行排序,则每个配对最终会得到一个唯一的组合ID:
tmp$both <- paste(pmin(as.character(tmp$Var1), as.character(tmp$Var2)),
pmax(as.character(tmp$Var1), as.character(tmp$Var2)), sep="_")
tmp
# Var1 Var2 dist both
# 2 stock.2 stock.1 1 stock.1_stock.2
# 3 stock.3 stock.1 2 stock.1_stock.3
# 4 stock.1 stock.2 1 stock.1_stock.2
# 6 stock.3 stock.2 4 stock.2_stock.3
# 7 stock.1 stock.3 2 stock.1_stock.3
# 8 stock.2 stock.3 4 stock.2_stock.3