在python中解析化合物

时间:2019-01-02 00:19:49

标签: python python-3.x string-parsing

我在处理用户输入时遇到麻烦。

Input: C6H12O6

Expected output: ["C",6, "H", 12, "O", 6]

我想检查符号是否与原子元素相对应 是有效的,这已经存储在我的数据库中。但是我很难获得这样的数组输出。

def createcompound(self, query):
    validatom = False
    query = 'C6H12O6'
    result = []

    firstcompound = query[0:query.find(" ")]        
    for char in firstcompound:
        for atom in self.atoms:
            if char == atom.symbol:
                validatom = True

    symbolcount = 0
    if validatom:
        for char in firstcompound:
           if not (char.isdigit()):
               symbolcount += 1

    print (firstcompound[0:symbolcount])
    print (firstcompound[symbolcount::])

更重要的是,化学式的其他输出也需要工作,但到目前为止,只有某些情况下使用O(n ^ 2)可以工作

如何在本机python 3.6中这样做?

3 个答案:

答案 0 :(得分:1)

这会将数字和字母分成列表的不同元素。

import re
str1="C6H12O6"
match = re.findall(r"([A-z]+)([0-9]*)", str1)
lst=[]
for item in match:
    x,y=item
    lst.append(x)
    lst.append(y)
print([x for x in lst if x])

输出

['C', '6', 'H', '12', 'O', '6']

但这并不完美。例如,“ CO2”将被分为['CO',2 ] NOT ['C','O','2 ]

答案 1 :(得分:1)

您可以使用itertools.groupby作为分组标准,并结合str.isdigit()和整数解析来利用list comprehension / generator comprehension来获得输出:

from itertools import groupby

def tryParseInt(x):
    """Tries to parse and return x as integer, on error returns x as is"""
    try:
        return int(x)
    except: # catches any error - you might opt to only catch ValueError
        return x

def split_groupby(text):
    """Splits a text at digit vs. character borders, returns list of characters
    and integers it detects. Uses str.isdigit to differentiate groups:
        'H2SeO4'-> ['H',2,'SeO',4]"""
    groupings = groupby(text,str.isdigit)

    # return it as list or generator - I prefer generator
    # return [ tryParseInt(''.join(grp[1])) for grp in groupings ]
    yield from (tryParseInt(''.join(grp[1])) for grp in groupings ) 


text = "C6H12O6"     
print(list(split_groupby(text)))  

输出:

['C', 6, 'H', 12, 'O', 6]

这可以通过将字符串分为str.isdigit() == Truestr.isdigit() == False的组来工作-并尽可能将找到的组解析为整数。

要正常工作-一次出现的元素也需要此说明符:'C1H3C1H2O1H1'要正确分成“化学”元素-如果不正确,它将被拆分为['CH',3,'CH',2,'OH']


要彼此分离“正确”的拼写元素(例如“ H2SeO4”),可以对结果进行后处理:

def split_elems(formula):
    """Takes a list and splits strings inside it into title()'d pieces.
    Replaces the former string with the split stings:
        ['H',2,'SeO',4] -> ['H',2,'Se','O',4]"""
    for idx, name in enumerate(formula[:]):
        if isinstance(name,str):
            if sum(c.isupper() for c in name)>1:
                tmp = []
                for c in name:
                    if c.isupper():
                        tmp.append([c])
                    else:
                        tmp[-1].append(c)
                formula.pop(idx)
                for t in tmp[::-1]:
                    formula.insert(idx,"".join(t))
    return formula

text = "H2SeO4"     
print(list(split_groupby(text)))                 # ['H', 2, 'SeO', 4]
print(split_elems(list(split_groupby(text))))    # ['H', 2, 'Se', 'O', 4]

您也可以使用正则表达式-在问题solution using re.split()中可以找到一个Split digit and text by regexp

答案 2 :(得分:0)

这是使用itertools.groupbyoperator.itemgetter的另一种解决方案:

from itertools import groupby
from operator import itemgetter

def key_func(x):
    """Groups increasing digits"""
    index, digit = x
    return index - int(digit) if digit.isdigit() else x


def map_int(x):
    """Maps integers"""
    return int(x) if x.isdigit() else x


def group_chemicals(x):
    """Groups chemicals using groupby"""
    return (
        "".join(map(itemgetter(1), g)) for _, g in groupby(enumerate(x), key=key_func)
    )

s = "C6H12O6"
print(list(map(map_int, group_chemicals(s))))
# ['C', 6, 'H', 12, 'O', 6]