Question

我有一个HTML页面。我想找出那里所有文本节点的xpath并将它们存储在excel文件中。

代码

start_path='.//tr|.//div[not(ancestor::div)][not(descendant::tr)]'
row_data_points=hxs.select(start_path)
for r in row_data_points:
    row=r.select('.//text()').extract()
    path_prefix='('+start_path+')['+str(row_data_points.index(r)+1)+']'
    row=[x.replace('\n','').replace('\t','') for x in row]
    row=[x for x in row if x.strip()!='']
    d={}
    for r1 in row:
        path=path_prefix+"//*[text()="+"'"+r1+"'"+"]"
        #path
        stg="var element=document.evaluate("+'"'+path+'"'+",document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;return element.getBoundingClientRect()"
        print "trying",stg
        print "1111",stg
        d[row.index(unidecode(r1))]={'value':unidecode(r1),'loc':driver.execute_script(str(stg))}

当我使用(.//text)获取所有文本节点，然后使用文本节点创建xpath时，出现了不需要的字符时，问题就来了。它找不到元素。是否还有其他方法或已经存在的任何库。

在带有文本的所有节点及其xpath之间创建映射

0 个答案: