Question

没有任何第三方库（如美丽的汤），在PYTHON中解析字符串的最简洁方法是什么。

鉴于以下文字，我喜欢＆＃34; uber_token＆＃34;的内容。被解析出来。＆＃34; 123456789＆＃34;

...

<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info">

...

谢谢！

Answer 1

正则表达式是解决方案。

使用import re

>>> import re
>>> s = '<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info"'
>>> regex=re.search(r'name="uber_token" value="([0-9]+)"',s)
>>> print regex.group(1)
123456789

Answer 2

免责声明：这个答案适用于快速而肮脏的脚本，可能缺乏稳健性和效率。这里的建议可能不应该用于存活超过几个小时的代码。

如果您不愿意学习正则表达式（并且您应该愿意学习正则表达式！），则可以在value="上拆分。可能效率非常低，但是简单易于调试。

values = []

with open('myfile.txt') as infile:
    for line in infile:
        candidates = line.split('value="')
        for s in candidates[1:]: #the first token is not a value
            try: #test if value is a number
                val = int(s.split('"')[0]) 
            except:
                continue
            values.append(val)

如果你专门研究HTML或XML，那么Python就有两个库。

HTMLParser：https://docs.python.org/2/library/htmlparser.html
ElementTree：https://docs.python.org/2/library/xml.etree.elementtree.html

然后，例如，您可以编写代码以在树中搜索具有属性＆＃34; name＆＃34;的节点。有价值＆＃34; uber_token＆＃34;，并获得＆＃34;值＆＃34;来自它的属性。

非常愚蠢的Python 2示例，并不需要对ElementTree进行过多的学习（可能需要简单的更正）：

import xml.etree.ElementTree as ET
tree = ET.parse('myfile.xml')
root = tree.getroot()

values = []

for element in root:
    if element.attrib['name'] == 'uber_token':
        values.append(element.attrib['value'])

Answer 3

Python附带了它自己的xml解析模块：https://docs.python.org/3.2/library/xml.html?highlight=xml#xml所以你不必使用任何第三方解析lib。如果您不愿意或不允许使用.....你可以随时使用正则表达式，但在解析XML时我会保持清醒

如何在python中解析字符串

3 个答案: