我想问你如何提取与某些关键字相关的子字符串。
例如我有以下文字:
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
我想提取一些关键字后的数值,例如:“折扣”和“其他折扣”。我正在尝试使用以下代码:
test = re.compile(r"""(
(Discount\s\d*)
(Other\sdiscount\s\d*)
)""", re.VERBOSE)
pr = test.findall(mystring)
我想获得(在这种情况下)一对 --> 折扣:0,0120 和其他折扣:0,0000 但它也可能足以获得如下所示的列表:
["Discount 0,0120", "Other discount 0,0000"]
非常感谢您的帮助。
答案 0 :(得分:0)
也可以做一个简单的循环:
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
list_mystring = mystring.split()
discount_value = "" # initalize var
other_discount = "" # initalize var
for i in range(len(list_mystring)):
if list_mystring[i] == "Discount":
discount_value = list_mystring[i+1]
if (list_mystring[i] == "Other") and (list_mystring[i+1]=="discount"):
other_discount = list_mystring[i+2]
my_pair = (discount_value, other_discount)
print(my_pair)
这里的输出是:('0,0120', '0,0000').
答案 1 :(得分:0)
我在研究方面有更好的运气。您还缺少 \d,\d 来捕获逗号前后的数字。
import re
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
pattern = "(Discount\s\d+,\d+)(.*)(Other\sdiscount\s\d+,\d+)"
p = re.search(pattern, mystring)
p.groups()
>> ('Discount 0,0120',
' Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % ',
'Other discount 0,0000')
答案 2 :(得分:0)
确实,这不是最有效的方法,但仍然有效:
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 " \
"Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
items_list = []
i = 0
for j, char in enumerate(mystring[:-2]):
if char.isdigit() and mystring[j + 2].isalpha() or mystring[j + 2] == '%':
if mystring[j + 2] == '%':
items_list.append(mystring[i:j+1])
i = j + 4
else:
items_list.append(mystring[i:j + 1])
i = j + 2
items_list.append(mystring[i:])
def itemDigit_sep(item):
item = item.replace(',', '.')
for i, char in enumerate(item[:-2]):
if char.isalpha() and item[i + 2].isdigit():
return (item[:i + 1], float(item[i +2:]))
item_value_list = [itemDigit_sep(item) for item in items_list]
[('Commission', 0.0), ('Packaging', 0.0426), ('Discount', 0.012), ('Transport', 0.069), ('F YEB', 0.0), ('Commission', 0.0), ('Payment discount', 0.0), ('Other discount', 0.0), ('YEB', 4.07), ('Industrial', 0.3856)]
或者您甚至可以将所有交易映射到其相应的值。
item2value_dict = {}
for item_value in items_list:
item, value = item2Value_sep(item_value)
item2value_dict[item] = value
{'Commission': 0.0, 'Packaging': 0.0426, 'Discount': 0.012, 'Transport': 0.069, 'F YEB': 0.0, 'Payment discount': 0.0, 'Other discount': 0.0, 'YEB': 4.07, 'Industrial': 0.3856}