我想要完成的是使用pprint(dict(str_types))
这是我的代码
from collections import defaultdict
str_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)
expected = ["Street", "Avenue", "Boulevard", "Drive", "Court", "Place", "Square", "Lane", "Road",
"Trail", "Parkway", "Commons"]
def audit_str_type(str_types, str_name, rex):
stn = rex.search(str_name)
if stn :
str_type = stn.group()
if str_type not in expected:
str_types[str_type].add(str_name)
我定义了一个审计标签元素的函数,其中k =“addr:street”,并且任何标签元素都与is_str_name函数匹配。
def audit(osmfile,rex):
osm_file = open(osmfile, "r", encoding="utf8")
str_types = defaultdict(set)
for event, elem in ET.iterparse(osm_file, events=("start",)):
if elem.tag == "node" or elem.tag == "way":
for tag in elem.iter("tag"):
if is_str_name(tag):
audit_str_type(str_types, tag.attrib['v'],rex)
return str_types
在上面的代码中,我使用“is_str_name”函数来调用审计函数来审计街道名称时过滤标记。
def is_str_name(elem):
return (elem.attrib['k'] == "addr:street")
str_types = audit(mydata, rex = str_type_re)
pprint.pprint(dict(str_types[:10]))
答案 0 :(得分:0)
使用pprint.pformat
取回对象的字符串表示形式而不是直接打印它,然后您可以按行分割并只打印出前几个:
whole_repr = pprint.pformat(dict(str_types))
for line in whole_repr.splitlines()[:10]:
print(line)
请注意,由于您没有MCVE,我无法对此进行测试,但我确实用一个更简单的例子来验证它:
>>> import pprint
>>> thing = pprint.pformat({i:str(i) for i in range(10000)})
>>> type(thing), len(thing)
(<class 'str'>, 147779)
>>> for line in thing.splitlines()[:10]:print(line)
{0: '0',
1: '1',
2: '2',
3: '3',
4: '4',
5: '5',
6: '6',
7: '7',
8: '8',
9: '9',