Question

我的任务是挖掘街区一级的人口普查数据。在学习如何导航并找到我正在寻找的东西后，我遇到了麻烦。 tabblock多边形（块级多边形）的id由15个长度的字符串组成，

前：'471570001022022'

但人口普查数据的格式标有：

'田纳西州谢尔比县人口普查第一组第2组第2022区块'

块ID格式化： state-county-tract-group-block，有一些前导零，可以生成15个字符。 sscccttttggbbbb

有谁知道快速将其变为可用格式？在我花时间尝试制作python脚本之前，我想我会问。

谢谢，克

Answer 1

好吧，我明白了。

ex ='田纳西州谢尔比县人口普查第一组第2组第2022区

new_id ='47157'+ ex [40：len（ex）-26] .zfill（4）+'0'+ ex [24] + ex [6:10]

州和县的价值是不变的;块组只能转到一位数（afaik）。

Answer 2

使用struct可能更整洁

>>> import struct
>>> r = '471570001022022'
>>> f = '2s3s4s2s4s'
>>> struct.unpack(f, r)
('47', '157', '0001', '02', '2022')
>>> s, c, t, g, b = unpack(f, r)
>>> print s
47

Answer 3

假设this data是正确的，并且您已将其解析为两个词典state_ids和county_ids，其中键是实体的字符串表示，值是数字表示为字符串：

def get_tabblock_id(tabblock_string):
    block, block_group, tract, county, state = re.match('Block (\\d+), Block Group (\\d+), Census Tract (\\d+), (.+), (.+)', tabblock_string).groups()
    return state_ids[state].zfill(2) + county_ids[county].zfill(3) + tract.zfill(4) + block_group.zfill(2) + block.zfill(4)

重新格式化人口普查标题

3 个答案: