Python计数子串

时间:2014-03-16 02:00:23

标签: python string count numbers substring

如何查找子串出现在字符串中的次数? 我有分子式,如果字母是大写的,它是一个元素(例如H),如果它有第一个大写字母和第二个小写而不是它是一个元素(例如Ba),如果在元素后面有数字我必须添加这个数字来自elment

示例:输入:Ba4H2Ba5Li3

如果我搜索Ba它应该打印9号(我有Ba4和Ba5,即9),如果我搜索H它应该打印2(一个字母H但它后面的数字2),而它应该打印3号

3 个答案:

答案 0 :(得分:3)

您可以使用正则表达式,例如

data = "Ba4H2Ba5Li3"
import re
result = {}
for element, count in re.findall(r"([A-Z][a-z]?)(\d*)", data):
    result[element] = result.get(element, 0) + int(1 if count == "" else count)
print result
# {'H': 2, 'Ba': 9, 'Li': 3}

现在,您可以从result获取每个项目的计数,就像这样

print result.get("Ba", 0)
# 9
print result.get("H", 0)
# 2
print result.get("Li", 0)
# 3
print result.get("Sa", 0)
# 0

答案 1 :(得分:2)

我将整个输入字符串解析为字典;正则表达式在这里会有所帮助:

import re
from collections import defaultdict

molecule = re.compile(r'([A-Z][a-z]?)(\d*)')

def parse_formula(f):
    counts = defaultdict(int)
    for name, count in molecule.findall(f):
        counts[name] += int(count or 1)
    return counts

这将在符号后面没有数字的分子计数为1; ' H3O'因此仍然可以正确计算。

现在您可以简单地查找元素:

counts = parse_formula('Ba4H2Ba5Li3')
print counts['Ba']
print counts['H']

演示:

>>> counts = parse_formula('Ba4H2Ba5Li3')
>>> counts
defaultdict(<type 'int'>, {'H': 2, 'Ba': 9, 'Li': 3})
>>> counts['H']
2
>>> counts['Ba']
9
>>> parse_formula('H3O')
defaultdict(<type 'int'>, {'H': 3, 'O': 1})

答案 2 :(得分:1)

这是一种更强大的方法,可以正确处理具有嵌套子表达式的公式,例如Na(OH)2Al(NO3)3

# Loosely based on example code from
# http://pyparsing.wikispaces.com/file/detail/chemicalFormulas.py
from pyparsing import Group, Forward, Literal, nums, oneOf, OneOrMore, Optional, Word

# from http://pyparsing-public.wikispaces.com/Helpful+Expressions
# element("He") => "He"
element = oneOf(
    """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
    Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
    As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
    Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
    Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
    Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
    Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
    Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No"""
)

# integer("123") => 123
to_int = lambda tokens: int(tokens[0])
integer = Word(nums).setParseAction(to_int)

# item("He") => {"He": 1}
# item("O2") => {"O": 2}
item_to_dict = lambda tokens: {a:b for a,b in tokens}
item = Group(element + Optional(integer, default=1)).setParseAction(item_to_dict)

# allow recursive definition of formula
Formula = Forward()

# expr("(OH)2") => {"O": 2, "H": 2}
lpar    = Literal("(").suppress()
rpar    = Literal(")").suppress()
expr_to_dict = lambda tokens: {el: num*tokens[1] for el,num in tokens[0].items()}
expr = (lpar + Formula + rpar + integer).setParseAction(expr_to_dict)

# ... complete the recursive definition
def formula_to_dict(tokens):
    total = {}
    for expr in tokens:
        for el,num in expr.items():
            total[el] = total.get(el, 0) + num
    return total
Formula <<= OneOrMore(item | expr).setParseAction(formula_to_dict)

# Finally, wrap it in an easy-to-use function:
def get_elements(s):
    return Formula.parseString(s)[0]

您可以像以下一样使用它:

>>> get_elements("Na(OH)2")
{'H': 2, 'Na': 1, 'O': 2}

>>> get_elements("Al(NO3)3")
{'Al': 1, 'N': 3, 'O': 9}

>>> get_elements("Ba4H2Ba5Li3")
{'Ba': 9, 'H': 2, 'Li': 3}