Question

我正在评估detect-secrets，但不确定为什么从检出秘密和挂钩中得到不同的结果。

这是一个简化的日志：

$ cat docs/how-to-2.md
AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"
$ detect-secrets scan --string $(cat docs/how-to-2.md)
AWSKeyDetector         : False
ArtifactoryDetector    : False
Base64HighEntropyString: True  (5.367)
BasicAuthDetector      : False
CloudantDetector       : False
HexHighEntropyString   : False
IbmCloudIamDetector    : False
IbmCosHmacDetector     : False
JwtTokenDetector       : False
KeywordDetector        : False
MailchimpDetector      : False
PrivateKeyDetector     : False
SlackDetector          : False
SoftlayerDetector      : False
StripeDetector         : False
TwilioKeyDetector      : False
$ detect-secrets-hook docs/how-to-2.md
$ detect-secrets-hook --baseline .secrets.baseline docs/how-to-2.md

我希望detect-secrets-hook会告诉我有关具有高熵的Azure存储帐户密钥。

有关基准的更多详细信息：

$ cat .secrets.baseline
{
  "custom_plugin_paths": [],
  "exclude": {
    "files": null,
    "lines": null
  },
  "generated_at": "2020-10-09T10:06:54Z",
  "plugins_used": [
    {
      "name": "AWSKeyDetector"
    },
    {
      "name": "ArtifactoryDetector"
    },
    {
      "base64_limit": 4.5,
      "name": "Base64HighEntropyString"
    },
    {
      "name": "BasicAuthDetector"
    },
    {
      "name": "CloudantDetector"
    },
    {
      "hex_limit": 3,
      "name": "HexHighEntropyString"
    },
    {
      "name": "IbmCloudIamDetector"
    },
    {
      "name": "IbmCosHmacDetector"
    },
    {
      "name": "JwtTokenDetector"
    },
    {
      "keyword_exclude": null,
      "name": "KeywordDetector"
    },
    {
      "name": "MailchimpDetector"
    },
    {
      "name": "PrivateKeyDetector"
    },
    {
      "name": "SlackDetector"
    },
    {
      "name": "SoftlayerDetector"
    },
    {
      "name": "StripeDetector"
    },
    {
      "name": "TwilioKeyDetector"
    }
  ],
  "results": {
    ".devcontainer/Dockerfile": [
      {
        ###obfuscated###
      }
    ],
    "deployment/export-sp.sh": [
      {
        ###obfuscated###
      }
    ],
    "docs/pip-install-from-artifacts-feeds.md": [
      {
        ###obfuscated###
      }
    ]
  },
  "version": "0.14.3",
  "word_list": {
    "file": null,
    "hash": null
  }
}

Answer 1

这绝对是特有的行为，但是经过一番调查，我发现您偶然发现了该工具的一个极端情况。

tl; dr

HighEntropyStringPlugin支持一组有限的字符（不包括;）
为减少误报，HighEntropyStringPlugin利用了在特定上下文中用字符串引用的启发式方法。
要改善用户界面，内联字符串扫描不不需要引号的字符串。

因此，功能不同：通过detect-secrets-hook运行时，由于;的存在，它不会相应地解析字符串。但是，在运行detect-secrets scan --string时，它不需要引号，并且可以将字符串分解。

详细说明

HighEntropyString测试非常嘈杂，即使没有积极修剪误报也是如此。它尝试执行此操作的一种方法是通过应用相当严格的正则表达式（source），该正则表达式requires it to be inside quotes。但是，在某些情况下，请使用this quoted requirement is removed（例如YAML文件和内联字符串扫描）。

删除引用的要求后，我们将得到以下细分：

>>> line = 'AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"'
>>> with self.non_quoted_string_regex(is_exact_match=False):
...    self.regex.findall(line)
['AZ_STORAGE_CS=', 'DefaultEndpointsProtocol=https', 'AccountName=storageaccount1234', 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==', 'EndpointSuffix=core', 'windows', 'net']

这样做时，我们可以看到AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==将触发base64插件，如下所示：

$ detect-secrets scan --string 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A=='
AWSKeyDetector         : False
ArtifactoryDetector    : False
Base64HighEntropyString: True  (5.367)
BasicAuthDetector      : False
CloudantDetector       : False
HexHighEntropyString   : False
IbmCloudIamDetector    : False
IbmCosHmacDetector     : False
JwtTokenDetector       : False
KeywordDetector        : False
MailchimpDetector      : False
PrivateKeyDetector     : False
SlackDetector          : False
SoftlayerDetector      : False
StripeDetector         : False
TwilioKeyDetector      : False

但是，当应用此引用的要求时，整个字符串有效负载将作为一个潜在的秘密进行扫描：DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net

这不会被标记，因为它使原始的base64正则表达式规则失效，该规则不知道如何处理;。

>>> self.regex.findall(line)
[]

因此，此功能有所不同，但是通过所描述的调用模式不会立即显而易见。

我该如何解决？

这是一个更具挑战性的问题，因为允许其他字符会改变熵的计算以及标记字符串的可能性。关于为所有角色创建插件的讨论已经展开，但是团队尚未能够为此确定默认的熵限制。

检测秘密和钩检结果之间的差异

1 个答案:

tl; dr

详细说明

我该如何解决？