检索多层数据结构

时间:2017-12-18 12:24:33

标签: regex

假设有这样的文字:

In [1]: import re
In [2]: with open('text.md', 'r') as f:
   ...:     cont = f.read()
In [3]: cont
Out[3]: '- ## First steps[¶](https://docs.djangoproject.com/en/2.0/#first-steps)\n\n  Are you new to Django or to programming? This is the place to start!\n\n  - **From scratch:** [Overview](https://docs.djangoproject.com/en/2.0/intro/overview/) | [Installation](https://docs.djangoproject.com/en/2.0/intro/install/)\n  - **Tutorial:** [Part 1: Requests and responses](https://docs.djangoproject.com/en/2.0/intro/tutorial01/) | [Part 2: Models and the admin site](https://docs.djangoproject.com/en/2.0/intro/tutorial02/) | [Part 3: Views and templates](https://docs.djangoproject.com/en/2.0/intro/tutorial03/) | [Part 4: Forms and generic views](https://docs.djangoproject.com/en/2.0/intro/tutorial04/) | [Part 5: Testing](https://docs.djangoproject.com/en/2.0/intro/tutorial05/) | [Part 6: Static files](https://docs.djangoproject.com/en/2.0/intro/tutorial06/) | [Part 7: Customizing the admin site](https://docs.djangoproject.com/en/2.0/intro/tutorial07/)\n  - **Advanced Tutorials:** [How to write reusable apps](https://docs.djangoproject.com/en/2.0/intro/reusable-apps/) | [Writing your first patch for Django](https://docs.djangoproject.com/en/2.0/intro/contributing/)\n\n  ## The model layer[¶](https://docs.djangoproject.com/en/2.0/#the-model-layer)\n\n  Django provides an abstraction layer (the “models”) for structuring and manipulating the data of your Web application. Learn more about it below:\n\n  - **Models:** [Introduction to models](https://docs.djangoproject.com/en/2.0/topics/db/models/) | [Field types](https://docs.djangoproject.com/en/2.0/ref/models/fields/) | [Indexes](https://docs.djangoproject.com/en/2.0/ref/models/indexes/) | [Meta options](https://docs.djangoproject.com/en/2.0/ref/models/options/) | [Model class](https://docs.djangoproject.com/en/2.0/ref/models/class/)\n  - **QuerySets:** [Making queries](https://docs.djangoproject.com/en/2.0/topics/db/queries/) | [QuerySet method reference](https://docs.djangoproject.com/en/2.0/ref/models/querysets/) | [Lookup expressions](https://docs.djangoproject.com/en/2.0/ref/models/lookups/)\n  - **Model instances:** [Instance methods](https://docs.djangoproject.com/en/2.0/ref/models/instances/) | [Accessing related objects](https://docs.djangoproject.com/en/2.0/ref/models/relations/)\n  - **Migrations:** [Introduction to Migrations](https://docs.djangoproject.com/en/2.0/topics/migrations/) | [Operations reference](https://docs.djangoproject.com/en/2.0/ref/migration-operations/) | [SchemaEditor](https://docs.djangoproject.com/en/2.0/ref/schema-editor/) | [Writing migrations](https://docs.djangoproject.com/en/2.0/howto/writing-migrations/)\n  - **Advanced:** [Managers](https://docs.djangoproject.com/en/2.0/topics/db/managers/) | [Raw SQL](https://docs.djangoproject.com/en/2.0/topics/db/sql/) | [Transactions](https://docs.djangoproject.com/en/2.0/topics/db/transactions/) | [Aggregation](https://docs.djangoproject.com/en/2.0/topics/db/aggregation/) | [Search](https://docs.djangoproject.com/en/2.0/topics/db/search/) | [Custom fields](https://docs.djangoproject.com/en/2.0/howto/custom-model-fields/) | [Multiple databases](https://docs.djangoproject.com/en/2.0/topics/db/multi-db/) | [Custom lookups](https://docs.djangoproject.com/en/2.0/howto/custom-lookups/) |[Query Expressions](https://docs.djangoproject.com/en/2.0/ref/models/expressions/) | [Conditional Expressions](https://docs.djangoproject.com/en/2.0/ref/models/conditional-expressions/) | [Database Functions](https://docs.djangoproject.com/en/2.0/ref/models/database-functions/)\n  - **Other:** [Supported databases](https://docs.djangoproject.com/en/2.0/ref/databases/) | [Legacy databases](https://docs.djangoproject.com/en/2.0/howto/legacy-databases/) | [Providing initial data](https://docs.djangoproject.com/en/2.0/howto/initial-data/) | [Optimize database access](https://docs.djangoproject.com/en/2.0/topics/db/optimization/) | [PostgreSQL specific features](https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/)'

它的章节由

检索
In [9]: chapters = re.findall(r'## (.+)\[', cont)
In [10]: chapters
Out[10]: ['First steps', 'The model layer']

它的部分是通过

获得的
In [21]: sections = re.findall(r'- \*\*(.+)\*\*',cont)
In [23]: sections
Out[23]:
['From scratch:',
 'Tutorial:',
 'Advanced Tutorials:',
 'Models:',
 'QuerySets:',
 'Model instances:',
 'Migrations:',
 'Advanced:',
 'Other:']

我想输出如下数据结构:

['First steps',['From scratch:',
                'Tutorial:',
                'Advanced Tutorials:'],
'The model layer',['Models:',
                 'QuerySets:',
                 'Model instances:',
                 'Migrations:',
                 'Advanced:',
                 'Other:']]

如何完成这样的任务?

1 个答案:

答案 0 :(得分:0)

同时查找章节和章节:

>>> content = re.findall(r'## (.+)\[|- \*\*(.+)\*\*', cont)

然后将它们放入您想要的结构中:

>>> structure = []
>>> for c, s in results:
        if c:
            structure.extend([c, []])
        elif s:
            structure[-1].append(s)

这导致:

>>> structure
['First steps', ['From scratch:', 'Tutorial:', 'Advanced Tutorials:'], 'The model layer', ['Models:', 'QuerySets:', 'Model instances:', 'Migrations:', 'Advanced:', 'Other:']]