我有两个.txt格式的点云文件(场景和绿色)。例如,场景点云通常包含超过100000条线,绿色包含20000条线。这两个文件的绿点有相等的行,但最后一个数字是每个点的标签。
场景:
0.805309, -3.43696, 6.85463, 0, 0, 0, 5
0.811636, -3.42248, 6.82576, 0, 0, 0, 5
-1.00663, 0.0985967, 3.02769, 42, 134, 83, 5
-1.00182, 0.098547, 3.02617, 43, 133, 83, 5
-0.997052, 0.0985018, 3.02478, 41, 133, 82, 5
0.811636, -3.42248, 6.82576, 0, 0, 0, 5
绿色:
-1.00663, 0.0985967, 3.02769, 42, 134, 83, 3
-1.00182, 0.098547, 3.02617, 43, 133, 83, 3
-0.997052, 0.0985018, 3.02478, 41, 133, 82, 3
我想用绿色文件中的相等行替换Scene的绿点中的整行,或者仅在两行相等的地方将标签号从5更改为3。最终结果将是这样的: 场景:
0.805309, -3.43696, 6.85463, 0, 0, 0, 5
0.811636, -3.42248, 6.82576, 0, 0, 0, 5
-1.00663, 0.0985967, 3.02769, 42, 134, 83, 3
-1.00182, 0.098547, 3.02617, 43, 133, 83, 3
-0.997052, 0.0985018, 3.02478, 41, 133, 82, 3
0.811636, -3.42248, 6.82576, 0, 0, 0, 5
我已经编写了两种类型的代码来执行此操作,但是由于要修改的文件很多,因此它们都加载了大量时间,这根本不好。 第一个代码:
import os
import fileinput
def main(scene, others):
for file in others:
other = open(file, "r+")
for line in other:
line1 = line[:-3]
f=scene
for sceneLine in fileinput.input(f,inplace=True):
new = sceneLine
sceneLine1 = sceneLine[:-3]
if sceneLine1 == line1:
print(sceneLine.replace(new, line), end='')
else:
print(sceneLine.replace(line,line), end='')
fileinput.close()
others = []
for file in os.listdir("./"):
if file.endswith(".txt"):
if file.startswith("pointCloudScene9863Cl"):
scene = file
else:
others.append(file)
main(scene,others)
第二个代码:
import os
import fileinput
import numpy
def main(scene1, others):
pointcloud = []
scene1 = open(scene1,"r+")
scene = []
for each_point in scene1:
scene.append(each_point)
for file in others:
other = open(file, "r+")
for line in other:
pointcloud = []
line1 = line[:-3]
for sceneLine in scene:
sceneLine1 = sceneLine[:-3]
if sceneLine1 == line1:
pointcloud.append(line)
else:
pointcloud.append(sceneLine)
scene = pointcloud
with open('pointcloud.txt', 'w') as points:
for item in scene:
points.write("%s" % item)
others = []
for file in os.listdir("./"):
if file.endswith(".txt"):
if file.startswith("pointCloudScene9863Cl"):
scene = file
else:
others.append(file)
main(scene,others)
这两种方法都能以很少的点数完美地工作,但是当我使用原始的点云文件时,则需要30分钟甚至更长的时间才能完成工作。当我基本上使用NESTED LOOPS时,我实际上在FOR LOOP中看到了问题,这意味着我将有100000 * 20000个循环来更改绿点。
是否有使用numpy数组或任何其他方法的有效方法?
答案 0 :(得分:3)
我有一个应该是适当的解决方案,但是在此之前,我有一个免责声明:没有您的更多信息,找不到合适的解决方案是不可能的。我们需要此问题的上下文,以及有关数据格式和您要执行的操作的更精确和详细的信息。
例如,比较浮点数是否相等感觉并不好,并且就精度而言,通常对数字的操作总是会存在一定的风险,等等。由于这些问题似乎来自同一地点,因此如果每个人都有某种唯一的ID,可用于检查是否相等。
就像这里的其他人一样,我的第一个反应是抓住麻木和熊猫。就我而言,这是一个错误,因为此任务根本不涉及很多数据操作或转换。
那么,这是我现在能想到的最简单的实现:
def point_parse(line):
line_point = line.split(", ")
line_point[0] = float(line_point[0])
line_point[1] = float(line_point[1])
line_point[2] = float(line_point[2])
line_point[3] = int(line_point[3])
line_point[4] = int(line_point[4])
line_point[5] = int(line_point[5])
line_point[6] = int(line_point[6])
return tuple(line_point)
green_points_set: frozenset
black_points_set: frozenset
with open("../resources/Green_long.txt", "r") as green_file:
green_points_set = frozenset((point_parse(line)[:-1] for line in green_file))
with open("../resources/Black_long.txt", "r") as black_file:
black_points_set = frozenset((point_parse(line)[:-1] for line in black_file))
def set_point_label(point):
point_comp = point[:-1]
if point_comp in green_points_set:
point_comp += (3,)
elif point_comp in black_points_set:
point_comp += (4,)
else:
point_comp = point
return point_comp
with open("../resources/Scene_long.txt", "r") as scene_file:
scene_points_new = (set_point_label(point_parse(line)) for line in scene_file)
form_lines = ((f"{res_line[0]}, {res_line[1]}, {res_line[2]}, {res_line[3]}, "
f"{res_line[4]}, {res_line[5]}, {res_line[6]}\n") for res_line in scene_points_new)
with open("../out/Scene_out.txt", "w") as scene_out:
scene_out.writelines(form_lines)
代码非常简单。为绿色和黑色点创建了集合,我们测试了成员资格,并适当地更改了标签。
我为自己创建了一些训练数据:一个场景,总计1,000,000点,125,000绿点和125,000黑点。运行时不到7秒(希望我没有犯任何严重错误!),内存使用量应该很少。
答案 1 :(得分:2)
我认为您应该问自己一些有关数据的基本问题:
答案 2 :(得分:2)
使用numba
jit编译的“蛮力”解决方案。只是为了好玩,最好使用frozenset
-approach。最昂贵的操作似乎是mod_arr[j,:] = mod[i,:]
期间的内存IO。
import timeit
import numpy as np
from numba import njit
### numba njit-ed version of nested loops
@njit
def modify(arr, mod, tol=0.000000001):
mod_arr = arr[:]
mask = np.ones(arr.shape[0]).astype(np.bool_)
idx = np.arange(0, arr.shape[0], 1)
for i in range(mod.shape[0]):
for j in idx[mask]:
if np.absolute(np.sum(arr[j,:-1]-mod[i,:-1])) < tol:
mod_arr[j,:] = mod[i,:]
mask[j] = False
return mod_arr
# "scene":
a = np.array([[0.805309, -3.43696, 6.85463, 0, 0, 0, 5],
[0.811636, -3.42248, 6.82576, 0, 0, 0, 5],
[-1.00663, 0.0985967, 3.02769, 42, 134, 83, 5],
[-1.00182, 0.098547, 3.02617, 43, 133, 83, 5],
[-0.997052, 0.0985018, 3.02478, 41, 133, 82, 5],
[0.811636, -3.42248, 6.82576, 0, 0, 0, 5]])
# "green":
m = np.array([[-1.00663, 0.0985967, 3.02769, 42, 134, 83, 3],
[-1.00182, 0.098547, 3.02617, 43, 133, 83, 3],
[-0.997052, 0.0985018, 3.02478, 41, 133, 82, 3]])
# desired output:
mod_arr_test = np.array([[0.805309, -3.43696, 6.85463, 0, 0, 0, 5],
[0.811636, -3.42248, 6.82576, 0, 0, 0, 5],
[-1.00663, 0.0985967, 3.02769, 42, 134, 83, 3],
[-1.00182, 0.098547, 3.02617, 43, 133, 83, 3],
[-0.997052, 0.0985018, 3.02478, 41, 133, 82, 3],
[0.811636, -3.42248, 6.82576, 0, 0, 0, 5]])
# check:
mod_arr = modify(a, m)
print([np.isclose(np.sum(mod_arr[i] - l), 0.) for i, l in enumerate(mod_arr_test)])
# -->
[True, True, True, True, True, True]
# now let's make the arrays big...
a = np.tile(a, (17000, 1)) # a.shape is (102000, 7)
m = np.tile(m, (7000, 1)) # m.shape is (21000, 7)
### performance check:
%timeit modify(a, m)
# -->
2min 55s ± 4.07 s per loop (mean ± std. dev. of 7 runs, 1 loop each)