Python:检查制表符分隔文件以获取适当数量的列

时间:2013-05-03 18:07:45

标签: python file input

检查以确保

a)每行长4列

b)如果程序末尾有一个新行('\ n'),请确保它不会失败

def ask_for_filename():
    filename=raw_input("Please enter file name: ")
    return filename

def read_data(filename):
        with open(filename) as f:
           data = f.readlines()

        i = 0
        for line in data:
            lineContains = line.split('\t')
            lineLength = len(lineContains)  #calculate elements


            i = i+1

            if lineLength < 3 and i < len(data):        
                print "File is invalid format."

        f.close()
        return data

请您纠正我遇到问题的地方,因为这部分代码不起作用。

        i = 0
        for line in data:
            lineContains = line.split('\t')
            lineLength = len(lineContains)  #calculate elements


            i = i+1

            if lineLength < 3 and i < len(data):        
                print "File is invalid format."

示例文件内容:

完整档案

AUTHOR(S)   YEAR    TITLE   JOURNAL/CONFERENCE

Accot;Zhai  2001    Scale effects in steering law tasks Proc. ACM CHI

Acredolo    1977    Developmental Changes in the Ability to Coordinate Perspectives of a Large-Scale Space  Developmental Psychology

Aginsky;Harris;Rensink;Beusmans 1997    Two strategies for learning a route in a driving simulator  Journal of Environmental Psychology

文件不完整(上述代码适用于此类文件):

AUTHOR(S)   YEAR    TITLE   JOURNAL/CONFERENCE

Accot;Zhai  2001    Scale effects in steering law tasks Proc. ACM CHI

Acredolo    Developmental Changes in the Ability to Coordinate Perspectives of a Large-Scale Space  Developmental Psychology

Aginsky;Harris;Rensink;Beusmans 1997    Two strategies for learning a route in a driving simulator  Journal of Environmental Psychology

Agrawala;Beers;Frohlich;Hanrahan;McDowall;Bolas 1997    The two-user responsive workbench: Support for collaboration through individual views of a shared space Proc. ACM SIGGRAPH

Ahmadabadi;Eiji 1996    Cooperation strategy for a group of object lifting robots   Proc. of IROS

2 个答案:

答案 0 :(得分:1)

您抱怨您的代码“不会以任何方式影响程序的其余部分”。

由于相关代码中没有任何内容可以修改任何数据或更改任何控制流,当然它不会影响程序的其余部分。因此read_data始终返回文件中的所有行,无效或无效。

由于你没有解释如何你想要它影响程序的其余部分,很难向你展示如何做你想要的......但我可以告诉你如何做东西

例如,不是返回所有行,而是返回有效行:

i = 0
result = []
for line in data:
    lineContains = line.split('\t')
    lineLength = len(lineContains)  #calculate elements

    i = i+1

    if lineLength < 3 and i < len(data):
        print "File is invalid format."
    else:
        result.append(line)

return result

或者,提出异常而不是返回任何内容:

i = 0
for line in data:
    lineContains = line.split('\t')
    lineLength = len(lineContains)  #calculate elements

    i = i+1

    if lineLength < 3 and i < len(data):
        raise ValueError("File is invalid format.")

return data

与此同时,您的代码还存在其他一些问题。

f.close()块中使用f后,您不应该致电with。通常你会很幸运,它会是无害的,但“通常无害且永远没有用”并不是你想要的那种代码。

如果您想计算某些内容中的所有行,请不要在循环中添加明确的i = i+1,只需使用enumerate

另外,我不确定i < len(data)应该做什么,因为它永远都是真的。所以我会把它留下来。 (这意味着我也可以完全离开i,因为它是你使用它的唯一地方......但我会留下它,所以我可以告诉你enumerate

几乎没有理由打电话给readlines()。文件已经是一个可迭代的行,就像readlines返回的列表一样。你所做的就是强迫你的代码变慢,并通过一次读取整个文件而不是按需读取更多的内存。

所以,这是跳过坏线版本:

def read_data(filename):
    result = []
    with open(filename) as f:
        for i, line in enumerate(f):
            lineContains = line.split('\t')
            lineLength = len(lineContains)  #calculate elements
            if lineLength < 3:        
                print "File is invalid format."
            else:
                result.append(line)
    return result

与此同时,你是否真的想为每一条无效线路打印出警告,如果可能有100000条呢?如果没有,你可以更简单:

def read_data(filename):
    def bad_line(line):
        lineContains = line.split('\t')
        lineLength = len(lineContains)  #calculate elements
        return lineLength < 3
    with open(filename) as f:
        return [line for line in f if not bad_line(line)]

答案 1 :(得分:0)

def is_data_valid(filename):
    data = open(filename).readlines()
    lines = [x.split('\t') for x in data]
    no_newlines = [line for line in lines if len(line) > 1]
    return all(len(line) == 4 for line in no_newlines)