Question

我是python的新手。尝试编写一个脚本，该脚本将使用来自文件的数字colomns，其中包含一个标题。以下是文件示例：

@File_Version: 4
PROJECTED_COORDINATE_SYSTEM
#File_Version____________-> 4
#Master_Project_______->
#Coordinate_type_________-> 1
#Horizon_name____________-> 
sb+
#Horizon_attribute_______-> STRUCTURE
474457.83994    6761013.11978  
474482.83750    6761012.77069   
474507.83506    6761012.42160   
474532.83262    6761012.07251   
474557.83018    6761011.72342   
474582.82774    6761011.37433   
474607.82530    6761011.02524

我想跳过标题。这是我试过的。当然，如果我知道哪些字符会出现在标题中，如“＃”和“@”，它就会起作用。但是，如何跳过包含任何字母字符的所有行？

in_file1 = open(input_file1_short, 'r')
out_file1 = open(output_file1_short,"w")
lines = in_file1.readlines ()
x = []
y = []
for line in lines:
    if "#" not in line and "@" not in line:
        strip_line = line.strip()
        replace_split = re.split(r'[ ,|;"\t]+', strip_line)
        x = (replace_split[0])
        y = (replace_split[1])
        out_file1.write("%s\t%s\n" % (str(x),str(y)))
in_file1.close ()

非常感谢！

Answer 1

我认为您可以使用这样的内置插件：

import string
for line in lines:
    if any([letter in line for letter in string.ascii_letters]):
        print "there is an ascii letter somewhere in this line"

然而，这仅仅是寻找ascii字母。

你也可以：

import unicodedata
for line in lines:
    if any([unicodedata.category(unicode(letter)).startswith('L') for letter in line]):
        print "there is a unicode letter somewhere in this line"

但前提是我正确了解我的unicode类别....

更干净（使用来自其他答案的建议。这适用于unicode行和字符串）：

for line in lines:
    if any([letter.isalpha() for letter in line]):
        print "there is a letter somewhere in this line"

但是，有趣的是，如果你这样做：

在[57]中：你＆＃39; \ u2161＆＃39; .isdecimal（）

出[57]：错误

在[58]中：你＆＃39; \ u2161＆＃39; .isdigit（）

出[58]：错误

在[59]中：你＆＃39; \ u2161＆＃39; .isalpha（）

出[59]：错误

罗马数字的unicode＆＃34; Two＆＃34;不是那些，但unicodedata.category（u＆＃39; \ u2161＆＃39;）确实返回了Nl＆＃39;表示数字（而且＆＃39; \ u2161＆＃39; .isnumeric（）为True）。

Answer 2

这将检查每一行中的第一个字符，并跳过所有不以数字开头的行：

for line in lines:
    if line[0].isdigit():
        # we've got a line starting with a digit

Answer 3

使用生成器管道过滤输入流。这将从原始输入行获取行，但停止检查整行中没有字母。

input_stream = (line in lines if
                reduce((lambda x, y: (not y.isalpha()) and x), line, True))

for line in input_stream:
    strip_line = ...

如何使用python跳过多个标题行

3 个答案: