我正在阅读一个大型文本文档并尝试拆分为多个列表。我真的很难实际分裂字符串。
案文的例子:
Youngstown, OH[4110,8065]115436
Yankton, SD[4288,9739]12011
966
Yakima, WA[4660,12051]49826
1513 2410
此数据包含以下格式的4条信息:
City[coordinates]Population Distances_to_previous
我的目标是将这些数据分成一个列表:
Data = [[City] , [Coordinates] , [Population] , [Distances]]
据我所知,我需要使用.split语句,但我已经迷失了尝试实现它们。
我非常感谢一些开始的想法!
答案 0 :(得分:1)
我会分阶段这样做。
我从以下内容开始:
numCities = 0
Data = []
i = 0
while i < len(lines):
split = lines[i].partition('[')
if (split[1]): # We found something
city = split[0]
split = split[2].partition(']')
if (split[1]):
coords = split[0] #If you want this as a list then rsplit it
population = split[2]
distances = []
if i > 0:
i += 1
distances = lines[i].rsplit(' ')
Data.append([city, coords, population, distances])
numCities += 1
i += 1
for data in Data:
print (data)
这将打印
['Youngstown, OH', '4110,8065', '115436', []]
['Yankton, SD', '4288,9739', '12011', ['966']]
['Yakima, WA', '4660,12051', '49826', ['1513', '2410']]
答案 1 :(得分:1)
最简单的方法是使用正则表达式。
lines = """Youngstown, OH[4110,8065]115436
Yankton, SD[4288,9739]12011
966
Yakima, WA[4660,12051]49826
1513 2410"""
import re
pat = re.compile(r"""
(?P<City>.+?) # all characters up to the first [
\[(?P<Coordinates>\d+,\d+)\] # grabs [(digits,here)]
(?P<Population>\d+) # population digits here
\s # a space or a newline?
(?P<Distances>[\d ]+)? # Everything else is distances""", re.M | re.X)
groups = pat.finditer(lines)
results = [[[g.group("City")],
[g.group("Coordinates")],
[g.group("Population")],
g.group("Distances").split() if
g.group("Distances") else [None]]
for g in groups]
样本:
In[50]: results
Out[50]:
[[['Youngstown, OH'], ['4110,8065'], ['115436'], [None]],
[['Yankton, SD'], ['4288,9739'], ['12011'], ['966']],
[['Yakima, WA'], ['4660,12051'], ['49826'], ['1513', '2410']]]
虽然如果可以的话,最好将其作为词典列表来做。
groups = pat.finditer(lines)
results = [{key: g.group(key)} for g in groups for key in
["City", "Coordinates", "Population", "Distances"]]
# then modify later
for d in results:
try:
d['Distances'] = d['Distances'].split()
except AttributeError:
# distances is None -- that's okay
pass