Question

我有一个文本文件，其中包含安全名称，$ amount和投资组合的百分比。我试图弄清楚如何使用正则表达式来分离公司。我有一个原始的解决方案，允许我.split('%')，然后创建我需要的3个变量，但我发现其中一些证券的名称中包含%，因此解决方案不充分。

字符串示例：

Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%

当前正则表达式

[a-zA-Z0-9,$.\s]+[.0-9%]$

我目前的正则表达式只能找到最后一家公司。例如，Eaton Corp. PLC$53,087,8430.047%

有关如何找到公司的每个实例的任何帮助？

需要解决方案

["Pinterest, Inc. Series F, 8.00%$24,808,9320.022%","ResMed,Inc.$23,495,3260.021%","Eaton Corp. PLC$53,087,8430.047%"]

Answer 1

在Python 3中：

import re
p = re.compile(r'[^$]+\$[^%]+%')
p.findall('Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%')

结果：

['Pinterest, Inc. Series F, 8.00%$24,808,9320.022%', 'ResMed,Inc.$23,495,3260.021%', 'Eaton Corp. PLC$53,087,8430.047%']

您最初的问题是$锚点使得正则表达式仅在该行的末尾匹配。不过，在$之后%删除了8.00仍然将Pinterest拆分为两个条目。

要解决此问题，正则表达式会在此之后查找$，然后查找%，并将%中的所有内容作为条目。这种模式适用于您提供的示例，但是，当然，我不知道它是否适用于您的所有数据。

编辑正则表达式的工作原理如下：

r'               Use a raw string so you don't have to double the backslashes
  [^$]+          Look for anything up to the next $
       \$        Match the $ itself (\$ because $ alone means end-of-line)
         [^%]+   Now anything up to the next %
              %  And the % itself
               ' End of the string

Answer 2

Python的工作解决方案，具有命名组：https://regex101.com/r/sqkFaN/2

(?P<item>(?P<name>.*?)\$(?P<usd>[\d,\.]*?%))

在我提供的链接中，您可以看到更改实时生效，侧栏提供了使用语法的说明。

用于证券的Python Regex

2 个答案: