Question

我为.off类型的文件编写了一个小的解析函数。在这种文件格式中，第一行应该是字母＆＃34; OFF＆＃34;第二行应该是3个数字，指示文件其余部分的大小。

我有成千上万的这些文件。但是，在这些文件的一小部分随机百分比中，前两行连接不正确（不确定原因）。如果没有使用readline()而不是readlines()进行迭代，我似乎无法在阅读时找到解决此问题的方法。

还请假设更改所有文件也是不切实际的（我考虑过尝试使用bash脚本，但它是一个公共数据集，然后我可能会在将来继续使用）。

有关如何解决这些损坏的标题行的任何建议吗？

这是我当前的解析功能：

import numpy as np
def off_vertex_parser(self, path_to_off_file):
    print path_to_off_file
    # Read the OFF file
    with open(path_to_off_file, 'r') as f:
        contents = f.readlines()

    # Find the number of vertices contained
    num_vertices = int(contents[1].strip().split(' ')[0])

    # Convert all the vertex lines to a list of lists
    vertex_list = [map(float, contents[i].strip().split(' ')) 
                    for i in range(2, 2+num_vertices)]

    # Return the vertices as a 3 x N numpy array
    return np.array(vertex_list).transpose(1,0)

以下是.off文件的两个示例。第一种格式正确：

OFF
5 0 0
-12.280500 26.701300 10.653150
-12.575700 26.313400 11.003550
-12.569100 26.309300 10.653150
-13.208100 25.441200 10.653150
-12.569100 26.309300 10.653150

，第二个格式不正确：

OFF5 0 0
-12.280500 26.701300 10.653150
-12.575700 26.313400 11.003550
-12.569100 26.309300 10.653150
-13.208100 25.441200 10.653150
-12.569100 26.309300 10.653150

Answer 1

您可以从以下任一格式解析顶点：

class SampleThread extends Thread {

    def testRunner

    SampleThread (testRunner) {
        this.testRunner = testRunner    
    }

    void run() {
        runTestStep()   
    }

    void runTestStep() {
        testRunner.runTestStepByName("POST accounts")
    }
}

def thread1 = new SampleThread(testRunner)
def thread2 = new SampleThread(testRunner)

thread1.start()
thread1.join()
thread2.start()
thread2.join()

在Python中读取一些混乱的文件

1 个答案: