Question

我在Mathematica中有两个列表：

list1 = {{a1, b1, c1}, ... , {an, bn, cn}}

和

list2 = {{d1, e1, f1}, ... , {dn, en, fn}}

列表包含数字结果，大致由50000个三元组组成。每个三元组表示两个坐标和这些坐标处的某些属性的数值。每个列表具有不同的长度，并且坐标的范围不完全相同。我的目的是关联每个列表中第三个属性的数值，因此我需要扫描列表并识别坐标匹配的属性。我的输出将是

list3 = {{ci, fj}, ... , {cl, fm}}

其中

{ai, bi}, ..., {al, bl}

将（大致）分别等于

{dj, ej}, ..., {dm, em}

通过“粗略”，我的意思是坐标将匹配一次，达到一定的精度：

list1(2) = Round[{#[[1]], #[[2]], #[[3]]}, {1000, 500, 0.1}] & /@ list1(2)

所以在这个过程之后，我有两个列表，其中包含一些匹配的坐标。我的问题是如何执行识别它们的操作并以最佳方式挑选出属性对？

6元素列表的一个例子是

list1 = {{-1.16371*10^6, 548315., 14903.}, {-1.16371*10^6, 548322., 14903.9}, 
   {-1.16371*10^6, 548330., 14904.2}, {-1.16371*10^6, 548337., 14904.8}, 
   {-1.16371*10^6, 548345., 14905.5}, {-1.16371*10^6, 548352., 14911.5}}

Answer 1

你可能想要使用这样的东西：

{Round[{#, #2}], #3} & @@@ Join[list1, list2];

% ~GatherBy~ First ~Select~ (Length@# > 1 &)

这将在舍入后对具有匹配坐标的所有数据点进行分组。您可以使用Round的第二个参数来指定要舍入的分数。

这假设单个列表中没有重复的点。如果有，您将需要删除它们以获得有用的对。告诉我，如果是这种情况，我会更新我的答案。

以下是使用Sow和Reap的另一种方法。同样的警告适用。这两个示例都是您实现功能的指南。

Reap[
  Sow[#3, {Round[{#, #2}]}] & @@@ Join[list1, list2],
  _,
  List
][[2]] ~Cases~ {_, {_, __}}

要处理每个列表中的重复后续元素，您可以在每个列表中使用Round和GatherBy，如下所示。

newList1 = GatherBy[{Round[{#, #2}], #3} & @@@ list1, First][[All, 1]];

newList2 = GatherBy[{Round[{#, #2}], #3} & @@@ list2, First][[All, 1]];

然后继续：

newList1 ~Join~ newList2 ~GatherBy~ First ~Select~ (Length@# > 1 &)

Answer 2

这是我的方法，依靠Nearest来匹配积分。

我们假设list1的元素数量少于list2。（否则你可以使用{list1, list2} = {list2, list1}）

交换它们

(* extract points *)

points1=list1[[All,{1,2}]];
points2=list2[[All,{1,2}]];

(* build a "nearest-function" for matching them *)

nf=Nearest[points1]

(* two points match only if they're closer than threshold *)
threshold=100;

(* This function will find the match of a point from points2 in points1.  
   If there's no match, the point is discarded using Sequence[]. *)
match[point_]:= 
   With[{m=First@nf[point]}, 
       If[Norm[m-point]<threshold, {m,point}, Unevaluated@Sequence[]]
   ]

(* find matching point-pairs *)
matches=match/@points1;

(* build hash tables to retrieve the properties associated with points quickly *)
Clear[values1,values2]
Set[values1[{#1,#2}],#3]&@@@list1;
Set[values2[{#1,#2}],#3]&@@@list2;

(* get the property-pairs *)
{values1[#1],values2[#2]}&@@@matches

另一种方法是在最近使用自定义DistanceFunction，以避免使用values1＆amp; values2，并有一个较短的计划。这可能更慢或更快，我根本没有测试大数据。

_{注意：实施需要多么复杂，具体取决于您的特定数据集。第一组中的每个点在第二组中是否匹配？有没有重复？同一数据集的点数有多接近？等等。我试图提供一些可以调整得相对健壮的东西，代价是代码更长。}

在Mathematica中舍入后匹配列表条目的最佳方法是什么？

2 个答案: