我有一个看起来像这样的文本文件:
1 a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
3 Often, stacks. a set of shelves for books or other materials ranged compactly one above the other, as in a library.
凌乱至少可以说。我想抓住每个数字之间的文本并将其存储到变量中。在数字1和2之间我会有 var1 ,而在2到3之间有 var2 存储文本。在这里它变得棘手,有时数字上升到24,有时它们只会变为1.我在python中解析时很缺乏经验,我不知道写什么来使这项工作。我该如何解析这些数据? TIA
答案 0 :(得分:0)
我更改了输入文件,以便第二个和第三个非空行之间没有换行符。 (你说新线不会总是出现。)
1 a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.3 Often, stacks. a set of shelves for books or other materials ranged compactly one above the other, as in a library.
此代码可以执行您想要的操作。
if line
用于避免处理空行。re.split
将数字分隔为数字,后跟空格。split_lines.split()
在空白处拆分行,以便它们可以通过下一行连接在一起,以减少空格。import re
result = []
with open('jake.txt') as jake:
for line in jake:
line = line.strip()
if line:
for split_lines in re.split(r'[0-9]+\s+', line):
items = split_lines.split()
new_line = ' '.join(items).strip()
if new_line:
result.append(new_line)
for r in result:
print (r)
这是输出。
a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
Often, stacks. a set of shelves for books or other materials ranged compactly one above the other, as in a library.
答案 1 :(得分:0)
好的,我们试一试。我们将使用正则表达式匹配第一行的数字。那之后我们会采取一切。我们会将其存储在var
中,以便我们可以遍历该组并将其拉出来
import re
regex = r"^\d+\s+(?P<var>.*)"
test_str = """1 a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
3 Often, stacks. a set of shelves for books or other materials ranged compactly"""
# Use the iterator to move across each row/match
matches = re.finditer(regex, test_str, re.MULTILINE) # use MUTLINE so that we can process each row.
for match in matches:
print(match.groupdict()['var']) # pull out the var we are looking for
print('----')
这是输出:
a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
----
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
----
Often, stacks. a set of shelves for books or other materials range compactly
----
这会让你接近吗?