我正在尝试找到一个相对Xpath(非绝对Xpath),它允许我从这个URL中提取数据:https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm
我的代码如下。 SalesB返回一个值('233,715'),但SalesA返回空。我做错了什么?
lapply(df_list,transform,month=mymonths[month])
[[1]]
month val
1 JAN 1
2 FEB 2
3 MAR 5
[[2]]
month val
1 JAN 1
2 FEB 2
3 MAR 5
4 APR 6
5 MAY 8
creating a new variable:
lapply(df_list,transform,newcolumn=mymonths[month])
[[1]]
month val newcolumn
1 1 1 JAN
2 2 2 FEB
3 3 5 MAR
[[2]]
month val newcolumn
1 1 1 JAN
2 2 2 FEB
3 3 5 MAR
4 4 6 APR
5 5 8 MAY
SalesB返回下面显示的值,可以通过SEC_pageA变量找到(参见https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm)。
我希望SalesA返回“净销售额”数字,如下所示(即6,538,336),可在此处找到:https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm
答案 0 :(得分:0)
因为某些文字不在一行中,因为xpath找不到你真正想要的东西。
from lxml import html
import requests
xpath_a = """
//*[text()[contains(., "CONSOLIDATED
STATEMENTS OF INCOME")]]/following::td[contains(., "Net
sales")][1]/following-sibling::td[@valign="bottom"][3]//text()
"""
SEC_pageA = requests.get('https://www.sec.gov/Archives/edgar/data/1000228/000100022810000006/the10k_2009.htm')
SEC_treeA = html.fromstring(SEC_pageA.content)
SalesA = SEC_treeA.xpath(xpath_a)
print(SalesA)
打印
['6,538,336']