三维数据挖掘性能

时间:2014-08-29 07:40:58

标签: python performance

我有一些三维位置的数据。

# max size of grid (x, y, z)
grid_size = (999, 999, 99)

class MyObject(object):
  def __init__(self, id):
    self.id = id
    self.trace = []

[...]
# objects have some positions in their "trace"
print(myobject1.trace)
[(65, 128, 12), (66, 128, 12), (66, 129, 12)]
print(myobject2.trace)
[(456, 255, 75), (456, 254, 75), (456, 254, 74)]

我需要创建一个包含所有这些对象位置的地图。目标是找到在此地图中找到对象的最佳性能方式。例如,我有一个X坐标列表:这些坐标中有哪些对象?

所以我想到了四个策略:

带字符串键的一维字典

{'65.128.12':myobject1, '66.128.12':myobject1, '66.129.12':myobject1, 
 '456.255.75':myobject2, '456.254.75':myobject2, '456.254.74':myobject2}

def find_in_str_map(search_points, map_str):
  found_objects = []
  for trace_point in search_points:
    key = str(trace_point[0])+'.'+str(trace_point[1])+'.'+str(trace_point[2])
    if key in map_str:
      if map_str[key].id != myobject.id:
        found_objects.append(map_str[key])
  return found_objects

使用int键的一维dict

{6512812:myobject1, 6612812:myobject1, 6612912:myobject1, 
 45625575:myobject2, 45625475:myobject2, 45625474:myobject2}

def find_in_int_map(search_points, map_str):
  found_myobjects = []
  for trace_point in search_points:
    key = trace_point[0]*100000+trace_point[1]*100+trace_point[2]
    if key in map_str:
      if map_str[key].id != myobject.id:
        found_myobjects.append(map_str[key])
  return found_myobjects

带有元组(coordonate)键的一维字典

{(65, 128, 12):myobject1, (66, 128, 12):myobject1, (66, 129, 12):myobject1, 
 (456, 255, 75):myobject2, (456, 254, 75):myobject2, (456, 254, 74):myobject2}

def find_in_tuple_map(search_points, map):
  found_myobjects = []
  for trace_point in search_points:
    if trace_point in map:
      if map[trace_point].id != myobject.id:
        found_objects.append(map[trace_point])
  return found_objects

三维字典

{456: {254: {74: myobject2, 75: myobject2}, 255: {75: myobject2}}, 65: {128: {12: myobject1}}, 66: {128: {12: myobject1}, 129: {12: myobject1}}}

def find_in_3d_map(search_points, map):
  founds_myobjects = []
  for trace_point in search_points:
    x = trace_point[0]
    y = trace_point[1]
    z = trace_point[2]
    if x in map:
      if y in map[x]:
        if z in map[x][y]:
          founds_myobjects.append(map[x][y][z])
  return founds_myobjects

所以,我用timeit(以及大量对象)来测试这些策略的性能:

print('str', timeit.timeit('find_in_str_map(bugs, map_str)', number=10, [...]
print('int', timeit.timeit('find_in_int_map(bugs, map_int)', number=10, [...]
print('3d ', timeit.timeit('find_in_3d_map(bugs, map_3d)', number=10, [...]
print('tup', timeit.timeit('find_in_tuple_map(bugs, map_tuple)', number=10, [...]

(可测试代码:http://pastebin.com/FfkeEw9U

结果是:

python2.7

('str', 8.213999032974243)
('int', 5.6337010860443115)
('3d ', 6.18729305267334)
('tup', 5.0934319496154785)

python3.3

str 10.11169655699996
int 5.984578157000215
3d  6.448565245998907
tup 5.139268291999542

在3d坐标集合的地图中是否存在其他策略来存储和挖掘?我提出的3个策略是可以优化的吗?

1 个答案:

答案 0 :(得分:0)

最简单的方法是使用您的协调元组作为地图的关键。

{(65,128,12):myobject1, (66,128,12):myobject1, (66,129,12):myobject1, 
 (456,255,75):myobject2, (456,254,75):myobject2, (456,254,74):myobject2}    

def find_collisions_tuple_map(bugs, map):
  collisions_bugs = []
  for bug in bugs:
    for trace_point in bug.get_possibles_future_trace_point():
      if trace_point in map:
      collisions_bugs.append(map[trace_point])
  return collisions_bugs

在我的电脑上,它稍快一点

('str', 10.188277582443057)
('int', 7.133011876243648)
('3d ', 7.486879201843017)
('tuple ', 6.406966607422291)