Question

我知道之前已经问过这个问题，但我无法得到我所看到的任何答案。我有一个JSON文件，它有几千行，并且想要在每次出现时简单地在两个字符串之间提取文本（这很多）。

作为一个简单的例子，我的JSON看起来像这样：

 docker run -p public_port:8000 -d my-running-app

所以我想输出＆＃34; customfield_11301＆＃34;之间的所有内容。和＆＃34; customfield_10730＆＃34;：

    "customfield_11300": null,
    "customfield_11301": [
      {
        "self": "xxxxxxxx",
        "value": "xxxxxxxxx",
        "id": "10467"
      }
    ],
    "customfield_10730": null,
    "customfield_11302": null,
    "customfield_10720": 0.0,
    "customfield_11300": null,
    "customfield_11301": [
      {
        "self": "zzzzzzzzzzzzz",
        "value": "zzzzzzzzzzz",
        "id": "10467"
      }
    ],
    "customfield_10730": null,
    "customfield_11302": null,
    "customfield_10720": 0.0,

我试图让它尽可能简单 - 所以不要关心输出中显示的括号。

这就是我所拥有的（输出方式比我想要的更多）：

      {
        "self": "xxxxxxxx",
        "value": "xxxxxxxxx",
        "id": "10467"
      }
    ],
      {
        "self": "zzzzzzzzzzzzz",
        "value": "zzzzzzzzzzz",
        "id": "10467"
      }
    ],

Answer 1

您需要使您的RegEx Lazy ：

customfield_11301(.*?)customfield_10730

Live Demo on Regex101

你的正则表达式贪婪。这意味着它会找到customfield_11301，然后一直运行，直到找到最后 customfield_10730。

以下是Greedy vs Lazy Regex的简单示例：

# Regex (Greedy): [(.*)]
# Input:          [foo]and[bar]
# Output:         foo]and[bar

# Regex (Lazy):   [(.*?)]
# Input:          [foo]and[bar]
# Output:         "foo" and "bar" separately

你的正则表达式与第一个非常相似，它捕获的太多了，而这个新的正则表达式捕获的数据量最少，因此可以按预期工作

Answer 2

这是一个PowerShell函数，它将在两个字符串之间找到一个字符串。

function GetStringBetweenTwoStrings($firstString, $secondString, $importPath){

    #Get content from file
    $file = Get-Content $importPath

    #Regex pattern to compare two strings
    $pattern = "$firstString(.*?)$secondString"

    #Perform the opperation
    $result = [regex]::Match($file,$pattern).Groups[1].Value

    #Return result
    return $result

}

然后您可以运行如下函数：

GetStringBetweenTwoStrings -firstString "Lorem" -secondString "is" -importPath "C:\Temp\test.txt"

我的test.txt文件中包含以下文本：

Lorem Ipsum只是印刷和排版行业的虚拟文本。

所以我的结果：

存有

Answer 3

快速回答是 - 将贪婪的捕获(.*)更改为非贪婪 - (.*?)。应该这样做。

customfield_11301(.*?)customfield_10730

否则捕获会尽可能多地吃掉，导致它继续前进customfield_10730。

此致

Answer 4

第一个问题是Get-Content管道会逐行而不是一次提供全部内容。您可以将Get-Content与Out-String进行管道传输，以将整个内容作为单个字符串获得，并对内容进行正则表达式。

针对您的问题的可行解决方案是：

Get-Content .\todays_changes.txt | Out-String | % {[Regex]::Matches($_, "(?<=customfield_11301)((.|\n)*?)(?=customfield_10730)")} | % {$_.Value}

输出将是：

": [
  {
    "self": "xxxxxxxx",
    "value": "xxxxxxxxx",
    "id": "10467"
  }
],
"

": [
  {
    "self": "zzzzzzzzzzzzz",
    "value": "zzzzzzzzzzz",
    "id": "10467"
  }
],
"

powershell提取两个字符串之间的文本

4 个答案: