如何查找子串出现在字符串中的次数? 我有分子式,如果字母是大写的,它是一个元素(例如H),如果它有第一个大写字母和第二个小写而不是它是一个元素(例如Ba),如果在元素后面有数字我必须添加这个数字来自elment
示例:输入:Ba4H2Ba5Li3
如果我搜索Ba它应该打印9号(我有Ba4和Ba5,即9),如果我搜索H它应该打印2(一个字母H但它后面的数字2),而它应该打印3号
答案 0 :(得分:3)
您可以使用正则表达式,例如
data = "Ba4H2Ba5Li3"
import re
result = {}
for element, count in re.findall(r"([A-Z][a-z]?)(\d*)", data):
result[element] = result.get(element, 0) + int(1 if count == "" else count)
print result
# {'H': 2, 'Ba': 9, 'Li': 3}
现在,您可以从result
获取每个项目的计数,就像这样
print result.get("Ba", 0)
# 9
print result.get("H", 0)
# 2
print result.get("Li", 0)
# 3
print result.get("Sa", 0)
# 0
答案 1 :(得分:2)
我将整个输入字符串解析为字典;正则表达式在这里会有所帮助:
import re
from collections import defaultdict
molecule = re.compile(r'([A-Z][a-z]?)(\d*)')
def parse_formula(f):
counts = defaultdict(int)
for name, count in molecule.findall(f):
counts[name] += int(count or 1)
return counts
这将在符号后面没有数字的分子计数为1; ' H3O'因此仍然可以正确计算。
现在您可以简单地查找元素:
counts = parse_formula('Ba4H2Ba5Li3')
print counts['Ba']
print counts['H']
演示:
>>> counts = parse_formula('Ba4H2Ba5Li3')
>>> counts
defaultdict(<type 'int'>, {'H': 2, 'Ba': 9, 'Li': 3})
>>> counts['H']
2
>>> counts['Ba']
9
>>> parse_formula('H3O')
defaultdict(<type 'int'>, {'H': 3, 'O': 1})
答案 2 :(得分:1)
这是一种更强大的方法,可以正确处理具有嵌套子表达式的公式,例如Na(OH)2
或Al(NO3)3
:
# Loosely based on example code from
# http://pyparsing.wikispaces.com/file/detail/chemicalFormulas.py
from pyparsing import Group, Forward, Literal, nums, oneOf, OneOrMore, Optional, Word
# from http://pyparsing-public.wikispaces.com/Helpful+Expressions
# element("He") => "He"
element = oneOf(
"""H He Li Be B C N O F Ne Na Mg Al Si P S Cl
Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No"""
)
# integer("123") => 123
to_int = lambda tokens: int(tokens[0])
integer = Word(nums).setParseAction(to_int)
# item("He") => {"He": 1}
# item("O2") => {"O": 2}
item_to_dict = lambda tokens: {a:b for a,b in tokens}
item = Group(element + Optional(integer, default=1)).setParseAction(item_to_dict)
# allow recursive definition of formula
Formula = Forward()
# expr("(OH)2") => {"O": 2, "H": 2}
lpar = Literal("(").suppress()
rpar = Literal(")").suppress()
expr_to_dict = lambda tokens: {el: num*tokens[1] for el,num in tokens[0].items()}
expr = (lpar + Formula + rpar + integer).setParseAction(expr_to_dict)
# ... complete the recursive definition
def formula_to_dict(tokens):
total = {}
for expr in tokens:
for el,num in expr.items():
total[el] = total.get(el, 0) + num
return total
Formula <<= OneOrMore(item | expr).setParseAction(formula_to_dict)
# Finally, wrap it in an easy-to-use function:
def get_elements(s):
return Formula.parseString(s)[0]
您可以像以下一样使用它:
>>> get_elements("Na(OH)2")
{'H': 2, 'Na': 1, 'O': 2}
>>> get_elements("Al(NO3)3")
{'Al': 1, 'N': 3, 'O': 9}
>>> get_elements("Ba4H2Ba5Li3")
{'Ba': 9, 'H': 2, 'Li': 3}