从Word导入特定文本

时间:2017-12-08 04:17:46

标签: python

我有几百个Word文档,我的代码中需要一些数字测量。当然,我不想每次都将测量值复制并粘贴到python中。我试图做的是:

ngram = "computer supported machine translation"
grams = ngram.split(" ")

# Start to end
for c in range(2, len(grams)):
    print(" ".join(grams[:c]))

# End to start
for c in range(2, len(grams)):
    print(" ".join(grams[-c:]))

返回:

r = []
with open('NGC1705_rotmod.dat') as fo:
    for rec in fo:
        r.append(rec[0:4])

但是,前三个元素(['# Di', '# Ra', '# kp', '0.22', '0.66', '1.11', '1.55', '2.00', '2.45', '2.89', '3.34', '3.78', '4.22', '4.66', '5.11', '5.56', '6.00'] )只是数据的标题,它们不是我需要的数据的一部分。有没有办法切断前3行?

3 个答案:

答案 0 :(得分:0)

是的,试试这个。可能有一种更清洁的方式,但这应该有效。

r = []
with open('NGC1705_rotmod.dat') as fo:
    next(fo)
    next(fo)
    next(fo)
    for rec in fo:
        r.append(rec[0:4])

答案 1 :(得分:0)

您可以使用itertools.islice

from itertools import islice

r = []
with open('NGC1705_rotmod.dat') as fo:
    for rec in islice(fo, 3, None):
        r.append(rec[0:4])

答案 2 :(得分:0)

你可以跟踪那里的行,然后如果行号为0则通过,因为标题只在第0行,所以类似于:

r = []
with open('NGC1705_rotmod.dat') as fo:

    for line_no,rec in enumerate(fo):
        if line_no==0:
            pass
        else:
            r.append(rec[0:4])