将xpath输出拆分为两个变量以将其插入数据库的优雅方法

时间:2012-08-03 16:20:06

标签: python database xpath lxml libxml2

我们有一个代码,它从我们的tomcat服务器中提取服务列表并将其插入到数据库中。基本上,代码将从具有以下结构的html结果中获得巨大的输出(100+结果):

<TABLE bgcolor=#dddddd border=1>
<TR>
<TD valign="top"><B>name</B></TD>
<TD>Loader</TD>
</TR>
<TR>
<TD valign="top"><B>enabled</B></TD>
<TD>true</TD>
</TR>
<TR>
<TD valign="top"><B>loadok</B></TD>
<TD>13</TD>
</TR>
</TABLE>
<TABLE bgcolor=#dddddd border=1>
<TR>
<TD valign="top"><B>name</B></TD>
<TD>tester</TD>
</TR>
<TR>
<TD valign="top"><B>enabled</B></TD>
<TD>false</TD>
</TR>
<TR>
<TD valign="top"><B>loadok</B></TD>
<TD>13</TD>
</TR>
</TABLE>

以下,实际代码(有效)

#!/usr/bin/env python

import psycopg2
import urllib2
import base64
import sys
import re
import lxml.html as LH

con = None

try:

    con = psycopg2.connect(database='xx', user='xx',password='xx',host='vxx')
    cur = con.cursor()
    qry ="select iservers.env,iservers.family,iservers.prefix,iservers.iserver,iservers.login,iservers.password,services.service,"   +\
    "proxy_access.proxy_server,proxy_access.proxy_user,proxy_access.proxy_pass "        +\
    "from services,iservers,proxy_access where iservers.env='TEST' and services.id='2' "
    cur.execute(qry)
    data = cur.fetchall()

    for result in data:

        env      = result[0]
        family   = result[1]
        prefix   = result[2]
        iserver  = result[3]
        login    = result[4]
        password = result[5]
        service  = result[6]
        proxyHost = result[7]
        proxyUser = result[8]
        proxyPass = result[9]

        proxy_auth = "http://"+proxyUser+":"+proxyPass+"@"+proxyHost
        proxy_handler = urllib2.ProxyHandler({"http": proxy_auth})

        opener = urllib2.build_opener(proxy_handler)
        urllib2.install_opener(opener)
        request = urllib2.Request("http://"+iserver+service)
        base64string = base64.encodestring('%s:%s' % (login, password)).replace('\n', '')
        request.add_header("Authorization", "Basic %s" % base64string)
        response = urllib2.urlopen(request)
        html = response.read()

        ###################### CHANGE THIS TO USE A HTML PARSER
        regex = r"name</B></TD>\s<TD>(.*?)</TD>\s</TR>\s<TR>\s(.*)enabled</B></TD>\s<TD>(.*)</TD>"
        for m in re.finditer(regex,html):
            print "inserting\t"+iserver+"\t"+m.group(1)
            cur.execute("INSERT INTO pereirtc.package_status (env,family,iserver,prefix,package,status) values (%s,%s,%s,%s,%s,%s)",(env,family,iserver,prefix,m.group(1),m.group(3)))
            con.commit()
        ###################### END
except psycopg2.DatabaseError, e:
    print 'Error %s' % e
    sys.exit(1)

finally:

    if con:
        con.close()

在stackoverflow的一些帮助之后,我被建议将“更改此....”的块更改为libxml。所以我得到了以下块:

  doc = LH.fromstring(html)
  tds = (td.text_content() for td in doc.xpath("//td"))
  for td, val in zip(*[tds]*2):
      if td in ("name","enabled"):
          print (td,val)

通过上面的例子,我得到了结果:

('name', 'Loader')
('enabled', 'true')

我想要的是,使用xpath插入结果将其插入数据库。因为即时通讯开始在python上,我阻止了如何使用xpath / libxml。

问候!

2 个答案:

答案 0 :(得分:0)

我在猜测:

blah = [('name', 'Loader'), ('enabled', 'true')]
blahd = dict(blah)
cursor.execute('insert into blah (name, enabled) values(%(name)s, %(enabled)s)', blahd)

或者,不使用命名值......:

cursor.execute('insert into blah (name, enabled) values(?, ?)', [i[1] for i in blah])

答案 1 :(得分:0)

我不确定您是否表示以下简单操作,但将结果拆分为两个变量的方法如下:

doc = LH.fromstring(html)
tds = (td.text_content() for td in doc.xpath("//td"))
for td, val in zip(*[tds]*2):
    if td == "name":
        name = val
    elif td == "enabled":
        enabled = val

print name
print enabled