Question

我正在尝试读取文件，但它看起来很尴尬，因为列之间的每个空格都不同。这就是我到目前为止所做的：

with open('sextractordata1488.csv') as f:
    #getting rid of title, aka unusable lines:
    for _ in xrange(15):
        next(f)
    for line in f:
        cols = line.split(' ')
        #9 because it's 9 spaces before the first column with real data
        print cols[10]

我查找了如何执行此操作并查看了tr和sed命令，这些命令在我尝试时会出现语法错误，而且我不确定代码中的哪个位置（在for循环中或之前？）。我希望将列之间的所有空间缩小到一个空格，这样我就可以始终如一地获得一列没有问题（此时因为它是一个从1到101的计数器列，我只得到10到99以及一堆空格和部分来自中间的其他列，因为1和101具有不同的字符数，因此与行的开头有不同的空格数。）

Answer 1

只需使用str.split() 而不使用参数。然后将该字符串拆分为任意宽度的空格。这意味着在非空白内容之间存在多少空格并不重要：

>>> '   this   is rather     \t\t hard            to parse  without\thelp\n'.split()
['this', 'is', 'rather', 'hard', 'to', 'parse', 'without', 'help']

请注意，也会删除前导和尾随空格。选项卡，空格，换行符和回车符都被视为空格。

为了完整起见，第一个参数也可以设置为None以获得相同的效果。这有助于了解何时需要使用第二个参数限制拆分：

>>> '   this   is rather     \t\t hard            to parse  without\thelp\n'.split(None)
['this', 'is', 'rather', 'hard', 'to', 'parse', 'without', 'help']
>>> '   this   is rather     \t\t hard            to parse  without\thelp\n'.split(None, 3)
['this', 'is', 'rather', 'hard            to parse  without\thelp\n']

Answer 2

cols = line.split()应该足够了

>> "a     b".split()
['a', 'b']

读取具有不同空格数的文件作为分隔符？

2 个答案: