Question

问题描述

有一组~4000个python文件，其中包含以下结构：

@ScriptInfo(number=3254,
            attibute=some_value,
            title="crawler for my website",
            some_other_key=some_value)

scenario_name = entity.get_script_by_title(title)

目标

目标是从 ScriptInfo 装饰器中获取标题的值（在这种情况下，它是＆＃34;我的网站＆＃34的抓取工具;）但是存在一些问题：

1）没有规则来命名包含标题的变量。这就是为什么它可以是title_name，my_title等。参见示例：

@ScriptInfo(number=3254,
            attibute=some_value,
            my_title="crawler for my website",
            some_other_key=some_value)

scenario_name = entity.get_script_by_title(my_title)

2）@ScriptInfo装饰器可能有两个以上的参数，因此从括号中获取其内容以获取第二个参数的值不是一个选项

我（非常天真）的解决方案

但保持不变的代码是scenario_name = entity.get_script_by_title(my_title)行。考虑到这一点，我提出了解决方案：

import re
title_variable_re = r"scenario_name\s?=\s?entity\.get_script_by_title\((.*)\)"
with open("python_file.py") as file:
    for line in file:
        if re.match(regexp, line):
            title_variable = re.match(title_variable_re, line).group(1)
title_re = title_variable  + r"\s?=\s\"(.*)\"?"
with open("python_file.py") as file:
    for line in file:
        if re.match(title_re, line):
            title_value = re.match(regexp, line).group(1)
print title_value

这段代码执行以下操作：

1）遍历（参见第一个with open）脚本文件并获取具有title值的变量，因为由程序员选择其名称 2）再次遍历脚本文件（请参阅第二个with open）并获取标题的值

stackoverflow系列的问题

是否有更好，更有效的方式来获取标题（my_title＆＃39; s，title_name等）的值，而不是遍历脚本文件两次？

Answer 1

如果您只打开一次文件并将所有行保存到fileContent，请在适当的位置添加break，然后重复使用匹配项来访问捕获的group，您可以获得类似的内容（print之后的括号为3.x，而不是2.7）：

import re

title_value = None 

title_variable_re = r"scenario_name\s?=\s?entity\.get_script_by_title\((.*)\)"
with open("scenarioName.txt") as file:
    fileContent = list(file.read().split('\n'))
    title_variable = None
    for line in fileContent:
        m1 = re.match(title_variable_re, line)
        if m1:
            title_variable = m1.group(1)
            break
    title_re = r'\s*' + title_variable  + r'\s*=\s*"([^"]*)"[,)]?\s*'
    for line in fileContent:
        m2 = re.match(title_re, line)
        if m2:
            title_value = m2.group(1)
            break
print(title_value)

这里是正则表达式中未排序的更改列表：

在title_variable之前留出空间，这是r'\s*' +的用途
允许=
在title_re中的行尾添加逗号或关闭轮次，这是[,)]?的用途
在行尾添加一些空格

在以下文件上测试时输入：

@ScriptInfo(number=3254,
        attibute=some_value,
        my_title="crawler for my website",
        some_other_key=some_value)

scenario_name = entity.get_script_by_title(my_title)

它产生以下输出：

crawler for my website

为字符串解析~4k文件（复杂条件）

问题描述

目标

我（非常天真）的解决方案

stackoverflow系列的问题

1 个答案: