Question

我有这个特定的功能来提取表单中的部分列表：Give[list, elem]返回 list 的部分，该部分对应于 elem 的位置全局$Reference变量（如果已定义）。我在整个代码中大量使用此函数，因此我决定对其进行优化。这是我设法到目前为止的地方，但坦率地说，我不知道如何前进。

ClearAll[Give, $Reference, set];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];

Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
   ref = Reference /. {opts} /. Options@Give;
   Which[
      Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
      Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
      Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
      True, Fold[give[#1, #2, ref] &, list, {elem}]
]];



In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}

In[109]:= Give[set, "B", "A"] (* return entry at specified row & column *)
Out[109]= 4

In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}

我决定删除不同的签名函数调用，因为列表版本可能会调用非列表版本，这意味着必须多次执行错误处理（对于列表中的每个元素）。遗憾的是，错误处理不能被丢弃。如果改进的版本更加健壮（例如可以处理更多维度），那不是问题，但上面的示例就足够了。

In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)

Out[139]= 0.031

In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)

Out[138]= 0.499

我确定这不是有效的代码，所以请随意改进它。任何帮助都是值得赞赏的，即使它仅减少了几纳秒。

Answer 1

大型列表的主要效率问题似乎来自映射Pick。如果用这个替换give的相应定义，则可以避免这种情况：

give[list_, elem_List, ref_] := 
    list[[elem /. Dispatch[Thread[ref -> Range[Length[ref]]]]]];

这是我的测试代码：

In[114]:= 
  Block[{$Reference = Range[100000],set = Range[100000]^2,rnd,ftiming,stiming},
      rnd = RandomSample[$Reference,10000];
      ftiming = First@Timing[res1 = Give[set,rnd]];
      Block[{give},
        give[list_,elem_List,ref_]:=list[[elem/.Dispatch[Thread[ref->Range[Length[ref]]]]]];
        give[list_,elem_,ref_]:=First@Pick[list,ref,elem];
        stiming = First@Timing[res2 = Give[set,rnd]];];
   {ftiming,stiming,res1===res2}
]

Out[114]= {1.703,0.188,True}

对于这个用例，你的速度提高了10倍。我没有测试2D，但猜测它也应该有帮助。

修改

您可以通过在$Reference正文的开头缓存Dispatch[Thread[ref->Range[Length[$Reference]]]（Give）的已调度表格来进一步提高效果，然后将其传递给give （明确地或通过使give成为内部函数 - 通过Module变量 - 这将引用它），以便在您调用give时不必重新计算它多次通过Fold。您也可以有条件地执行此操作，比如在elem中有大量元素列表，以证明创建调度表所需的时间。

Answer 2

这是基于我索引实数的问题的另一个解决方案。它使用延迟评估来显示错误消息（如果需要的话）（我在这个网站上学到的一个技巧！感谢所有人的奉献精神，在这里学习新东西总是很愉快！）

ListToIndexFunction[list_List,precision_:0.00001]:=
   Module[{numbersToIndexFunction},

      numbersToIndexFunction::indexNotFound="Index of `1` not found.";

      MapThread[(numbersToIndexFunction[#1]=#2)&,{Round[list,precision],Range[Length@list]}];
      numbersToIndexFunction[x_]/;(Message[numbersToIndexFunction::indexNotFound,x];False):=Null;

      numbersToIndexFunction[Round[#,precision]]&
   ];

Test: 
f=ListToIndexFunction[{1.23,2.45666666666,3}]
f[2.456666]
f[2.456665]

Answer 3

这与列昂尼德的答案类似，但是以我自己的风格。

我使用相同的Dispatch表格，我建议尽可能将其作为外部表格。为此，我建议在$Rules更改时更新新符号$Reference。例如：

$Reference = RandomSample["A"~CharacterRange~"Z"];

$Rules = Dispatch@Thread[$Reference -> Range@Length@$Reference];

如果经常这样做（问），这可以自动为方便起作用。

除此之外，我的完整代码：

ClearAll[Give, $Reference, Reference, $Rules];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

Options[Give] = {Reference :> $Reference};

Give[list_List, elem___, opts : OptionsPattern[]] := 
  Module[{ref, pos, rls},
   ref = OptionValue[Reference];
   rls = If[{opts} == {}, $Rules, Dispatch@Thread[ref -> Range@Length@ref]];
   Which[
    ref === {} || Head@ref =!= List,
        Message[Give::noref]; {},
    Complement[Union@Flatten@{elem}, ref] =!= {},
        Message[Give::noelem, elem, ref]; {},
    Length@{elem} > Depth@list - 1, 
        Message[Give::nodepth, {elem}, Depth@list]; {},
    True,
        list[[##]] & @@ ({elem} /. rls)
   ]
  ];

Answer 4

这是我让这段代码休息2年后得到的。它会记住给定引用集的调度表，并使用Part - 类型语法。我删除了所有错误消息，并删除了全局$Reference符号。非常不喜欢Mathematica ，我从不喜欢它。

dispatch[ref_] := dispatch@ref = (Dispatch@Thread[ref -> Range@Length@ref]);
give[list_, elem__, ref_] := list[[Sequence @@ ({elem} /. dispatch@ref)]];

Memoization确保给定ref的调度表仅计算一次。在内存中维护多个调度表不是问题，因为它们通常很小。

ref = Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

give[set, "B", ref]          (* ==> {4, 5, 6}              *)
give[set, "B", "A", ref]     (* ==> 4                      *)
give[set, {"B", "A"}, ref]   (* ==> {{4, 5, 6}, {1, 2, 3}} *)

定时：

n = 20000;
{
First@Timing[give[set, #, ref] & /@ RandomChoice[ref, n]],
First@Timing[give[set, RandomChoice[ref, n], ref]],
First@Timing[Table[give[set, Sequence @@ RandomChoice[ref, 2], ref], {n}]]
}

{0.140401, 0., 0.202801}

将其与原始功能的时间进行比较：

{
First@Timing[Give[set, #] & /@ RandomChoice[ref, n]],
First@Timing[Give[set, RandomChoice[ref, n]]],
First@Timing[Table[Give[set, Sequence @@ RandomChoice[ref, 2]], {n}]]
}

{0.780005, 0.015600, 1.029607}

优化零件提取

4 个答案: