在Python中的数字之间按文本拆分变量

时间:2018-01-12 00:05:00

标签: python python-2.7 parsing

我有一个看起来像这样的文本文件:

1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.

2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.

3 Often, stacks. a set of shelves for books or other materials ranged     compactly one above the other, as in a library.

凌乱至少可以说。我想抓住每个数字之间的文本并将其存储到变量中。在数字1和2之间我会有 var1 ,而在2到3之间有 var2 存储文本。在这里它变得棘手,有时数字上升到24,有时它们只会变为1.我在python中解析时很缺乏经验,我不知道写什么来使这项工作。我该如何解析这些数据? TIA

2 个答案:

答案 0 :(得分:0)

我更改了输入文件,以便第二个和第三个非空行之间没有换行符。 (你说新线不会总是出现。)

1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.

2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.3 Often, stacks. a set of shelves for books or other materials ranged     compactly one above the other, as in a library.

此代码可以执行您想要的操作。

  • if line用于避免处理空行。
  • re.split将数字分隔为数字,后跟空格。
  • split_lines.split()在空白处拆分行,以便它们可以通过下一行连接在一起,以减少空格。
  • 如果结果行非空,则将其附加到结果列表中。
import re

result = []
with open('jake.txt') as jake:
    for line in jake:
        line = line.strip()
        if line:
            for split_lines in re.split(r'[0-9]+\s+', line):
                items = split_lines.split()
                new_line = ' '.join(items).strip()
                if new_line:
                    result.append(new_line)
for r in result:
    print (r)

这是输出。

a more or less orderly pile or heap: a precariously balanced stack of books; a neat stack of papers.
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
Often, stacks. a set of shelves for books or other materials ranged compactly one above the other, as in a library.

答案 1 :(得分:0)

好的,我们试一试。我们将使用正则表达式匹配第一行的数字。那之后我们会采取一切。我们会将其存储在var中,以便我们可以遍历该组并将其拉出来

import re

regex = r"^\d+\s+(?P<var>.*)"

test_str = """1 a more or less orderly pile or heap:                 a precariously       balanced stack of books; a neat stack of papers.
2 a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
3 Often, stacks. a set of shelves for books or other materials ranged     compactly"""

# Use the iterator to move across each row/match
matches = re.finditer(regex, test_str, re.MULTILINE) # use MUTLINE so that we can process each row.

for match in matches:
   print(match.groupdict()['var']) # pull out the var we are looking for
   print('----')

这是输出:

a more or less orderly pile or heap: a precariously       balanced stack of books; a neat stack of papers.
----
a large, usually conical, circular, or rectangular pile of hay, straw, or the like.
----
Often, stacks. a set of shelves for books or other materials range compactly
----

这会让你接近吗?