Question

我想用Python改进我的代码。我正在寻找逻辑帮助，以更少的代码获得相同的结果。

我的程序通过参数获取一串原子并“学习”它们，返回它所学习的原子列表。

我想知道是否有任何方法可以优化我的代码。

def mol_term(molecule):
    upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    list_of_atoms = []

    for i in range(len(molecule) - 1): #goes all string long
        if molecule[i] in upper: 
            if not molecule[i+1] in upper:
                temp = molecule[i] + molecule[i+1]  #if atom has two letters
                i = i + 1
            else:
                temp = molecule[i]    #if not

            if not temp in list_of_atoms:
                    list_of_atoms.append(temp)  #if atom is not in the list appends to it
    if molecule[-1] in upper:
        list_of_atoms.append(molecule[-1])  #checks last letter


    return print(list_of_atoms)

非常感谢。

Answer 1

您正在寻找一个正则表达式，它捕获一个大写字符，后跟一个较低的字符。

list(set(re.findall('[A-Z][a-z]?', 'CuBDa')))

但你可能会忽视数字，即二氧化碳，这样就可以了。

re.findall('[A-Z][a-z]?[0-9]*', 'C4H10FO2P')

如果您只想忽略这些数字，第一个表达式将起作用

Answer 2

这应该可以解决问题

import re
molecule = 'CH3COOH'
print set(re.findall('[A-Z][a-z]?',molecule))

将打印：

set(['H', 'C', 'O'])

Answer 3

我建议您查看Python PLY文档并查看Andrew Dalke的示例用于分析的解析示例。（http://www.dalkescientific.com/writings/NBN/parsing_with_ply.html）

您可以使用原子符号和原子/符号在分子中出现的时间来定义标记，例如对于像CH3COOH（乙酸）这样的分子

import lex

tokens = (
   "SYMBOL",
   "COUNT"
         )

t_SYMBOL = (
     r"C[laroudsemf]?|Os?|N[eaibdpos]?|S[icernbmg]?|P[drmtboau]?|"
     r"H[eofgas]?|A[lrsgutcm]|B[eraik]?|Dy|E[urs]|F[erm]?|G[aed]|"
     r"I[nr]?|Kr?|L[iaur]|M[gnodt]|R[buhenaf]|T[icebmalh]|"
     r"U|V|W|Xe|Yb?|Z[nr]"
        )

def t_COUNT(t):
    r"\d+"
    t.value = int(t.value)
    return t



lex.lex()

lex.input("CH3COOH")
for tok in iter(lex.token, None):
    print repr(tok.type), repr(tok.value)

当我运行代码时，我得到以下

'符号''C' '符号''H' 'COUNT'3 '符号''C' '符号''O' '符号''O' '符号''H'

此处提供更多信息 http://www.dabeaz.com/ply/

Python：更好的逻辑方法来解决“Atom学习”

3 个答案: