我有一个包含以下许多部分的文件:
[40.742742,-73.993847]
[40.739389,-73.985667]
[40.74715499999999,-73.97992]
[40.750573,-73.988415]
[40.742742,-73.993847]
[40.734706,-73.991915]
[40.736917,-73.990263]
[40.736104,-73.98846]
[40.740315,-73.985263]
[40.74364800000001,-73.993353]
[40.73729099999999,-73.997988]
[40.734706,-73.991915]
[40.729226,-74.003463]
[40.7214529,-74.006038]
[40.717745,-74.000389]
[40.722299,-73.996634]
[40.725291,-73.994413]
[40.729226,-74.003463]
[40.754604,-74.007836]
[40.751289,-74.000649]
[40.7547179,-73.9983309]
[40.75779,-74.0054339]
[40.754604,-74.007836]
我需要在每个部分中读取一对坐标列表(每个部分用额外的\n
分隔)。
在我有一个类似的文件中(除了没有额外的换行符之外),我从整个文件中绘制一个多边形。我可以使用以下代码读取坐标并在matplotlib中绘制它:
mVerts = []
with open('Manhattan_Coords.txt') as f:
for line in f:
pair = [float(s) for s in line.strip()[1:-1].split(", ")]
mVerts.append(pair)
plt.plot(*zip(*mVerts))
plt.show()
如何完成相同的任务,除了多个多边形,我的文件中的每个多边形都被一个额外的换行符隔开?
答案 0 :(得分:4)
这是我个人最喜欢的方法,将文件“分块”成由空格分隔的事物组:
from itertools import groupby
def chunk_groups(it):
stripped_lines = (x.strip() for x in it)
for k, group in groupby(stripped_lines, bool):
if k:
yield list(group)
我建议ast.literal_eval
将列表的字符串表示转换为实际的python列表:
from ast import literal_eval
with open(filename) as f:
result = [[literal_eval(li) for li in chunk] for chunk in chunk_groups(f)]
给出:
result
Out[66]:
[[[40.742742, -73.993847],
[40.739389, -73.985667],
[40.74715499999999, -73.97992],
[40.750573, -73.988415],
[40.742742, -73.993847]],
[[40.734706, -73.991915],
[40.736917, -73.990263],
[40.736104, -73.98846],
[40.740315, -73.985263],
[40.74364800000001, -73.993353],
[40.73729099999999, -73.997988],
[40.734706, -73.991915]],
[[40.729226, -74.003463],
[40.7214529, -74.006038],
[40.717745, -74.000389],
[40.722299, -73.996634],
[40.725291, -73.994413],
[40.729226, -74.003463],
[40.754604, -74.007836],
[40.751289, -74.000649],
[40.7547179, -73.9983309],
[40.75779, -74.0054339],
[40.754604, -74.007836]]]
答案 1 :(得分:2)
使用json
代替ast
,对于roippi的想法略有不同,
import json
from itertools import groupby
with open(FILE, "r") as coodinates_file:
grouped = groupby(coodinates_file, lambda line: line.isspace())
groups = (group for empty, group in grouped if not empty)
polygons = [[json.loads(line) for line in group] for group in groups]
from pprint import pprint
pprint(polygons)
#>>> [[[40.742742, -73.993847],
#>>> [40.739389, -73.985667],
#>>> [40.74715499999999, -73.97992],
#>>> [40.750573, -73.988415],
#>>> [40.742742, -73.993847]],
#>>> [[40.734706, -73.991915],
#>>> [40.736917, -73.990263],
#>>> [40.736104, -73.98846],
#>>> [40.740315, -73.985263],
#>>> [40.74364800000001, -73.993353],
#>>> [40.73729099999999, -73.997988],
#>>> [40.734706, -73.991915]],
#>>> [[40.729226, -74.003463],
#>>> [40.7214529, -74.006038],
#>>> [40.717745, -74.000389],
#>>> [40.722299, -73.996634],
#>>> [40.725291, -73.994413],
#>>> [40.729226, -74.003463],
#>>> [40.754604, -74.007836],
#>>> [40.751289, -74.000649],
#>>> [40.7547179, -73.9983309],
#>>> [40.75779, -74.0054339],
#>>> [40.754604, -74.007836]]]
答案 2 :(得分:2)
在已发布的答案中采用了许多漂亮的方法。其中任何一个都没有错。
然而,采用明显但可读的方法也没有错。
另外,您似乎正在处理地理数据。这种格式是你一直都会遇到的,而分段分隔符通常不像额外换行那样明显。 (有很多相当糟糕的特殊“ascii导出”格式,特别是在不起眼的专有软件中。例如,一种常见格式在段中最后一行的末尾使用F
作为分隔符(即1.0 2.0F
)。许多其他人根本不使用分隔符,并且如果距离最后一个点的距离超过“x”,则需要启动一个新的分段/多边形。) ,这些东西经常成为多GB的ascii文件,因此将整个内容读入内存可能是不切实际的。
我的观点是:无论您选择哪种方法,都要确保理解它。你将再次这样做,而且它将变得非常不同,难以概括。你绝对应该 学习像itertools
这样的库,但要确保你完全理解你正在调用的函数。
这是“明显但可读”方法的一个版本。它更加冗长,但没有人会对它的作用感到头疼。 (你可以用几种略有不同的方式编写这个相同的逻辑。使用对你最有意义的东西。)
import matplotlib.pyplot as plt
def polygons(infile):
group = []
for line in infile:
line = line.strip()
if line:
coords = line[1:-1].split(',')
group.append(map(float, coords))
else:
yield group
group = []
else:
yield group
fig, ax = plt.subplots()
ax.ticklabel_format(useOffset=False)
with open('data.txt', 'r') as infile:
for poly in polygons(infile):
ax.plot(*zip(*poly))
plt.show()