要解析的不寻常的表

时间:2016-04-25 18:59:24

标签: python parsing

如何解析这种类型的表?

https://primes.utm.edu/lists/small/10000.txt

                     The First 10,000 Primes
                    (the 10,000th is 104,729)
     For more information on primes see http://primes.utm.edu/

  2      3      5      7     11     13     17     19     23     29 
 31     37     41     43     47     53     59     61     67     71 
 73     79     83     89     97    101    103    107    109    113 

这些不是逗号分隔或xml结构化数字。你知道如何将它们读入列表吗?

1 个答案:

答案 0 :(得分:3)

只需知道数据从第四行开始并在结束前结束一行,就可以解析表的结构。此外,整个表具有整数内容。例如:

    # Using the requests HTTP client library
    import requests
    # Get data from HTTP request
    data = requests.get("http://primes.utm.edu/lists/small/10000.txt").text
    # Nested list comprehension: Split data into lines, consider from fourth line to second last, then split those lines into columns which will be evaluated as integers.
    [[int(e) for e in l.strip().split()] for l in data.split('\n')[4:-2]]

瞧。

这是有效的,因为隐式拆分方法将在诸如制表符,空格组等空格上拆分。