这个问题是对早期问题的补充。如果您需要更多背景信息,可以在此处查看原始问题:
Populating Python list using data obtained from lxml xpath command
我已将@ ihor-kaharlichenko的优秀建议(来自我原来的问题)合并到修改后的代码中,在这里:
from lxml import etree as ET
from datetime import datetime
xmlDoc = ET.parse('http://192.168.1.198/Bench_read_scalar.xml')
response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)
dmtVal = []
for dmt in response.iter('dmt'):
val = [str(dmt.xpath('./%s/text()' % tag)) for tag in tags]
val.insert(0, str(datetime.now())) #Add timestamp at beginning of each record
dmtVal.append(val)
for item in dmtVal:
str(item).strip('[')
str(item).strip(']')
str(item).strip('"')
这最后一块是我遇到问题的地方。我为dmtVal
获取的数据如下:
[['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]
但是,我真的希望数据看起来像这样:
[['2012-08-16 12:38:45.152222', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 12:38:45.152519', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]
我认为这是一个相当简单的字符串剥离作业,我在原始迭代(最初填充dmtVal
)中尝试了代码,但这不起作用,所以我如上所列,在循环外部进行了剥离操作,但它仍然无法正常工作。我在想我正在做一些noob-error,但找不到它。欢迎任何建议!
感谢大家提供及时有用的回复。这是更正后的代码:
from lxml import etree as ET
from datetime import datetime
xmlDoc = ET.parse('http://192.168.1.198/Bench_read_scalar.xml')
print '...Starting to parse XML nodes'
response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)
dmtVal = []
for dmt in response.iter('dmt'):
val = [' '.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]
val.insert(0, str(datetime.now())) #Add timestamp at beginning of each record
dmtVal.append(val)
哪个收益率:
...Starting to parse XML nodes
[['2012-08-16 14:41:10.442776', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 14:41:10.443052', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]
...Done
谢谢大家!
答案 0 :(得分:2)
将您当前的数据设为grps
解决方案1 - ast.literal_eval
import ast
grps = [['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]
desired_output = [[grp[0]] + [ast.literal_eval(item)[0] for item in grp[1:]] for grp in grps]
print desired_output
<强>输出强>
[['2012-08-16 12:38:45.152222', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 12:38:45.152519', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]
<强>解释强>
ast.literal_eval是一种安全的eval
方式。它只适用于eval数据类型(字符串,数字,元组,列表,dicts,布尔值和None)。在您的情况下,它会将“['1.0']”评估为长度为1的列表,如['1.0']
。您可能希望了解一下,并确保理解list comprehensions。
写这个的另一种方法是:
desired_output = []
for grp in grps: # loop through each group
new_grp = grp[0] # assign the first element (an array) to be our new_grp
for item in grp[1:] # loop over every item from index 1 to the end
evaluated_item = ast.literal_eval(item) # get the evaluated data
new_grp.append(evaluated_item[0]) # append the item in the 1 item list to the new_grp
desired_output.append(new_grp) # append the new_grp to the desired_output list
解决方案2 - 正则表达式
import re
stripper = re.compile("[\[\]']")
grps = [['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]
desired_output = [[grp[0]] + [ stripper.sub('', item) for item in grp[1:]] for grp in grps]
您的解决方案的问题是,for循环中迭代的项目不会通过引用传递,因此更改它们不会影响原始数据。
解决方案3 - 修复原始代码
要修复您的解决方案,您可以:
for i, grp in enumerate(dmtVal): # loop over the inner lists
for j, item in enumerate(grp):
dmtVal[i][j] = item.strip('\]')
dmtVal[i][j] = dmtVal[i][j].lstrip('\[')
dmtVal[i][j] = dmtVal[i][j].strip("'")
您可以改为使用取消引用的值dmtVal[i][j]
,操纵它,然后在最后分配回item
,而不是在每次剥离时将balue balue指定给dmtVal[i][j]
。 / p>
for i, grp in enumerate(dmtVal): # loop over the inner lists
for j, item in enumerate(grp):
# Could intead be
item = item.strip('\]')
item = dmtVal[i][j].lstrip('\[')
item = dmtVal[i][j].strip("'")
dmtVal[i][j] = item
或者更好的解决方案(imho):
for i, grp in enumerate(dmtVal): # loop over the inner lists
for j, item in enumerate(grp):
dmtVal[i][j] = item.replace('[', '').replace(']', '').replace("'", '')
答案 1 :(得分:1)
这会做你需要的,也许不是最好的方式:
new_dmt_val = []
for sublist in dmtVal:
new_dmt_val.append([elem.strip('[\'').strip('\']') for elem in sublist])
试图让它变得可读,它可能更少,但更容易混淆。
答案 2 :(得分:1)
答案是:首先不要创建字符串。
您的问题出在代码的这一部分:
for dmt in response.iter('dmt'):
val = [str(dmt.xpath('./%s/text()' % tag)) for tag in tags]
我猜您在此处使用str()
尝试从列表xpath()
返回中提取字符串。
然而,这不是你得到的; str()
只是为您提供列表的字符串表示。
你有几个选择可以做你想做的事
但鉴于你正在解析html,因此无法确定列表将包含多少元素,你最好的选择可能是使用''.join()
:
for dmt in response.iter('dmt'):
val = [''.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]
编辑:如果您使用此代码,则不需要最后一次循环。
答案 3 :(得分:1)
string.strip
仅剥离前导和尾随字符。您可能希望使用string.replace
代替。另请注意,string.strip
(和string.replace
)会返回字符串的副本。
或仅使用''.join()
代替str
并完全放弃整个剥离业务:
val = [''.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]
作为旁注,您可能也希望使用datetime.isoformat
代替str
:
val.insert(0, datetime.now().isoformat()) #Add timestamp at beginning of each record
请参阅help(datetime)
了解更多选项
答案 4 :(得分:1)
xml
是原始帖子的字符串...(我认为这涵盖两种方式......)
from lxml import etree
from datetime import datetime
from ast import literal_eval
tree = etree.fromstring(xml).getroottree()
dmts = []
for dmt in tree.iterfind('dmt'):
to_add = {'datetime': datetime.now()}
to_add.update( {n.tag:literal_eval(n.text) for n in dmt} )
dmts.append(to_add)
您仍然可以稍后明确地对节点进行排序 - 尽管我发现这种方法更清晰,因为您可以使用名称而不是索引(这取决于引入或删除节点是否应该是错误)