Question

使用Python，我想读取一个文本文件，搜索一个字符串，并打印该匹配字符串与另一个字符串之间的所有行。

文本文件如下所示：

Text=variables.Job_SalesDispatch.CaptionNew
    Tab=0
    TabAlign=0
    }
   }
  }
[UserVariables]
 User1=@StJid;IF(fields.Fieldtype="Artikel.Gerät"  , STR$(fields.id,0,0)  , @StJid)
[Parameters]
 [@Parameters]
  {
  [Parameters]
   {
   LL.ProjectDescription=? (default)
   LL.SortOrderID=
   }
  }
[PageLayouts]
 [@PageLayouts]
  {
  [PageLayouts]
   {
   [PageLayout]
    {
    DisplayName=
    Condition=Page() = 1
    SourceTray=0

现在，我要打印所有“ UserVariables”，因此仅[UserVariables]和下一行之间以方括号开头的行。在此示例中，该值为[Parameters]。

到目前为止，我所做的是：

with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:

  for line in file:
    uservars = re.findall('\b(\w*UserVariables\w*)\b', line)
    print (uservars)

只有[]给我的东西。

Answer 1

如果您不是必须使用正则表达式，则可以使用以下内容：

with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:
  inside_uservars = False
  for line in file:
    if inside_uservars:
      if line.strip().startswith('['):
        inside_uservars = False
      else:
        print(line)
    if line.strip() == '[UserVariables]':
      inside_uservars = True

Answer 2

我们可以尝试通过以下正则表达式模式使用re.findall：

\[UserVariables\]\n((?:(?!\[.*?\]).)*)

这表示要匹配[UserVariables]标记，后跟一个稍微复杂的表达式：

((?:(?!\[.*?\]).)*)

此表达式是一种回火点技巧，一次可以匹配任何一个字符，只要紧接在前面的不是方括号中包含的另一个标签即可。

matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)

[' User1=@StJid;IF(fields.Fieldtype="Artikel.Ger\xc3\xa4t"  , STR$(fields.id,0,0)  , @StJid)\n']

编辑：

我的答案假设整个文件内容都位于单个Python字符串中的内存中。您可以使用以下方法读取整个文件：

with open('Path/to/your/file.txt', 'r') as content_file:
    input = content_file.read()
matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)

如何使用Python和正则表达式从文件中提取文本部分

2 个答案: