如何解析这种类型的表?
https://primes.utm.edu/lists/small/10000.txt
The First 10,000 Primes
(the 10,000th is 104,729)
For more information on primes see http://primes.utm.edu/
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
这些不是逗号分隔或xml结构化数字。你知道如何将它们读入列表吗?
答案 0 :(得分:3)
只需知道数据从第四行开始并在结束前结束一行,就可以解析表的结构。此外,整个表具有整数内容。例如:
# Using the requests HTTP client library
import requests
# Get data from HTTP request
data = requests.get("http://primes.utm.edu/lists/small/10000.txt").text
# Nested list comprehension: Split data into lines, consider from fourth line to second last, then split those lines into columns which will be evaluated as integers.
[[int(e) for e in l.strip().split()] for l in data.split('\n')[4:-2]]
瞧。
这是有效的,因为隐式拆分方法将在诸如制表符,空格组等空格上拆分。