在python中解析以制表符分隔的文件中的特定行

时间:2017-11-15 11:07:48

标签: python string-parsing

我有一个没有格式的制表符分隔文件。文件格式仅列为FILE。当我用文本编辑器打开它时,它看起来像:

Job Wanted_VERB "_. 2000    1   1
Job Wanted_VERB "_. 2001    1   1
Job Wanted_VERB "_. 2002    5   5
Job Wanted_VERB "_. 2004    2   2
Job Wanted_VERB "_. 2005    2   2
Job Wanted_VERB "_. 2006    2   2
Job Wanted_VERB "_. 2007    1   1
Job Well Done   1917    1   1
Job Well Done   1930    3   2
Job Well Done   1937    1   1
Job Well Done   1940    5   4
Job Well Done   1941    3   3
Job Well Done   1942    1   1
Job Well Done   1943    2   2
Job Well Done   1944    1   1
Job Well Done   1945    1   1
Job Well Done   1946    3   3
Job Well Done   1948    1   1
Job Well Done   1949    4   4
Job Well Done   1950    1   1
Job Well Done   1951    3   2
Job Well Done   1952    6   4
Job Well Done   1953    9   5
Job Well Done   1954    6   4
Job Well Done   1955    5   5
....
....

其中前三列为3克句子,其余列与词频相关。

这是一个巨大的文件,所以我只想解析只包含我正在寻找的3克单词的部分。例如,从上表中,我只想解析Job Well Done部分。

Job Well Done   1917    1   1
Job Well Done   1930    3   2
Job Well Done   1937    1   1
Job Well Done   1940    5   4
Job Well Done   1941    3   3
Job Well Done   1942    1   1
Job Well Done   1943    2   2
Job Well Done   1944    1   1
Job Well Done   1945    1   1
Job Well Done   1946    3   3
Job Well Done   1948    1   1
Job Well Done   1949    4   4
Job Well Done   1950    1   1
Job Well Done   1951    3   2
Job Well Done   1952    6   4
Job Well Done   1953    9   5
Job Well Done   1954    6   4
Job Well Done   1955    5   5

我目前正在执行此操作来解析整个文件并将其放入列表中:

with open(file, 'rt', encoding='UTF8') as input:
    z = [line.strip().split('\t') for line in input]

任何帮助?

1 个答案:

答案 0 :(得分:0)

是的,将startwith添加为if语句,如下所示:

with open(file, 'rt', encoding='UTF8') as input:
    z = [line.strip().split("\t") for line in f if line.startswith("Job Well Done")]