Question

我有一个看起来像这样的文本文件：

1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.

2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.

3 Often, stacks. a set of shelves for books or other materials ranged     compactly one above the other, as in a library.

凌乱至少可以说。我想抓住每个数字之间的文本并将其存储到变量中。在数字1和2之间我会有 var1 ，而在2到3之间有 var2 存储文本。在这里它变得棘手，有时数字上升到24，有时它们只会变为1.我在python中解析时很缺乏经验，我不知道写什么来使这项工作。我该如何解析这些数据？ TIA

Answer 1

我更改了输入文件，以便第二个和第三个非空行之间没有换行符。（你说新线不会总是出现。）

1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.

2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.3 Often, stacks. a set of shelves for books or other materials ranged     compactly one above the other, as in a library.

此代码可以执行您想要的操作。

if line用于避免处理空行。
re.split将数字分隔为数字，后跟空格。
split_lines.split()在空白处拆分行，以便它们可以通过下一行连接在一起，以减少空格。
如果结果行非空，则将其附加到结果列表中。

import re

result = []
with open('jake.txt') as jake:
    for line in jake:
        line = line.strip()
        if line:
            for split_lines in re.split(r'[0-9]+\s+', line):
                items = split_lines.split()
                new_line = ' '.join(items).strip()
                if new_line:
                    result.append(new_line)
for r in result:
    print (r)

这是输出。

a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
Often, stacks. a set of shelves for books or other materials ranged compactly one above the other, as in a library.

Answer 2

好的，我们试一试。我们将使用正则表达式匹配第一行的数字。那之后我们会采取一切。我们会将其存储在var中，以便我们可以遍历该组并将其拉出来

import re

regex = r"^\d+\s+(?P<var>.*)"

test_str = """1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.
2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
3 Often, stacks. a set of shelves for books or other materials ranged     compactly"""

# Use the iterator to move across each row/match
matches = re.finditer(regex, test_str, re.MULTILINE) # use MUTLINE so that we can process each row.

for match in matches:
   print(match.groupdict()['var']) # pull out the var we are looking for
   print('----')

这是输出：

a more or less orderly pile or heap: a precariously       balanced stack of books; a neat stack of papers.
----
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
----
Often, stacks. a set of shelves for books or other materials range compactly
----

这会让你接近吗？

在Python中的数字之间按文本拆分变量

2 个答案: