如何解析用户代理字符串?蟒蛇

时间:2012-02-09 10:44:14

标签: python user-agent

<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>

<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>

<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>

上面列出了我所获得的网址样本。我想知道Python中是否有任何模块可用于解析用户代理。我想得到这些样本的输出,如:

Android
HTC Streaming player
ipad

如果是PC用户,我想获得网络浏览器类型。

4 个答案:

答案 0 :(得分:13)

有一个名为httpagentparser的库:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

答案 1 :(得分:3)

答案 2 :(得分:0)

您可以尝试使用正则表达式编写自己的代码:http://docs.python.org/library/re.html 或者看看这个:http://pypi.python.org/pypi/httpagentparser

答案 3 :(得分:0)

我要给出的答案与开源项目无关,但它确实提供了有关谁正在研究如何解析HTTP user-agent 字符串以获得{{3} }会想知道的。

WURFL是历史悠久的工具,用于执行User-Agent(更常见的是HTTP请求)分析并获得易于消耗的设备/浏览器信息。这是广告技术行业的事实上的标准,这要归功于专有数据库,可以从HTTP请求中压缩出最后一滴信息。在实践中,代码将类似于:

device id = samsung_sm_g981u_ver1_subuau1
get_capability('model_name') = SM-G981U1
get_capabilities(static_capabilities) = {'model_name': 'SM-G981U1', 'brand_name': 'Samsung', 'device_os': 'Android'}
get_virtual_capability('complete_device_name') = Samsung SM-G981U1 (Galaxy S20 5G)
get_virtual_capabilities(virtual_capabilities) = {'complete_device_name': 'Samsung SM-G981U1 (Galaxy S20 5G)', 'form_factor': 'Smartphone'}

上面的代码将返回:

from wmclient import *

try:
    client = WmClient.create("http", "localhost", 8080, "")
      :
    ua = "Mozilla/5.0 (Linux; Android 7.1.1; ONEPLUS A5000 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) " \
         "Chrome/56.0.2924.87 Mobile Safari/537.36 "

    client.set_requested_static_capabilities(["brand_name", "model_name"])
    client.set_requested_virtual_capabilities(["is_smartphone", "form_factor"])
    print()
    print("Detecting device for user-agent: " + ua);

    # Perform a device detection calling WM server API
    device = client.lookup_useragent(ua)
           :
        # Let's get the device capabilities and print some of them
        capabilities = device.capabilities
        print("Detected device WURFL ID: " + capabilities["wurfl_id"])
        print("Device brand & model: " + capabilities["brand_name"] + " " + capabilities["model_name"])
        print("Detected device form factor: " + capabilities["form_factor"])
        if capabilities["is_smartphone"] == "true":

更多信息device intelligence

对于那些想在未获得ScientiaMobile的试用许可证的情况下尝试使用WURFL(特别是PyWURFL)的人,我公司最近发布了一个版本的WURFL(称为WURFL微服务),可以从{{3 }},hereAWS(当然还有ScientiaMobile本身)。同样针对该产品,Pythion也得到了完全支持,尽管语法略有不同,因为该产品依赖于Cloud中的服务器端组件进行更新:

// manifest.json

{
  "manifest_version": 2,
  "name": "Test Extension",
  "version": "1.0",

  "content_scripts": [
    {
      "matches": ["*test.com/*"],
      "js": ["main.js"]
    }
  ], 
}

完整的示例和对GitHub客户端代码的引用可以在Azure中找到。

披露:我在提供此处描述的库的公司工作。