lxml或其他库使用xpath解析嵌套的xml

时间:2016-10-30 01:31:34

标签: python xml xpath lxml

我有以下结构的XML:

<SEC-DOCUMENT>0001209191-16-145281.txt : 20161006
<SEC-HEADER>0001209191-16-145281.hdr.sgml : 20161006
<ACCEPTANCE-DATETIME>20161006175047
ACCESSION NUMBER:       0001209191-16-145281
CONFORMED SUBMISSION TYPE:  4
PUBLIC DOCUMENT COUNT:      1
CONFORMED PERIOD OF REPORT: 20161006
FILED AS OF DATE:       20161006
DATE AS OF CHANGE:      20161006

ISSUER:     

    COMPANY DATA:   
        COMPANY CONFORMED NAME:         TEJON RANCH CO
        CENTRAL INDEX KEY:          0000096869
        STANDARD INDUSTRIAL CLASSIFICATION: REAL ESTATE [6500]
        IRS NUMBER:             770196136
        STATE OF INCORPORATION:         DE
        FISCAL YEAR END:            1231

    BUSINESS ADDRESS:   
        STREET 1:       4436 LEBEC ROAD
        STREET 2:       PO BOX 1000
        CITY:           LEBEC
        STATE:          CA
        ZIP:            93243
        BUSINESS PHONE:     6612483000

    MAIL ADDRESS:   
        STREET 1:       4436 LEBEC RD
        STREET 2:       PO BOX 1000
        CITY:           LEBEC
        STATE:          CA
        ZIP:            93243

REPORTING-OWNER:    

    OWNER DATA: 
        COMPANY CONFORMED NAME:         Bielli Gregory S.
        CENTRAL INDEX KEY:          0001597159

    FILING VALUES:
        FORM TYPE:      4
        SEC ACT:        1934 Act
        SEC FILE NUMBER:    001-07183
        FILM NUMBER:        161925684

    MAIL ADDRESS:   
        STREET 1:       P.O. BOX 1000
        CITY:           LEBEC
        STATE:          CA
        ZIP:            93243
</SEC-HEADER>
<DOCUMENT>
<TYPE>4
<SEQUENCE>1
<FILENAME>doc4.xml
<DESCRIPTION>FORM 4 SUBMISSION
<TEXT>
<XML>
<?xml version="1.0"?>
<ownershipDocument>

    <schemaVersion>X0306</schemaVersion>

    <documentType>4</documentType>

    <periodOfReport>2016-10-06</periodOfReport>

    <notSubjectToSection16>0</notSubjectToSection16>

    <issuer>
        <issuerCik>0000096869</issuerCik>
        <issuerName>TEJON RANCH CO</issuerName>
        <issuerTradingSymbol>TRC</issuerTradingSymbol>
    </issuer>

    <reportingOwner>
        <reportingOwnerId>
            <rptOwnerCik>0001597159</rptOwnerCik>
            <rptOwnerName>Bielli Gregory S.</rptOwnerName>
        </reportingOwnerId>
        <reportingOwnerAddress>
            <rptOwnerStreet1>P.O. BOX 1000</rptOwnerStreet1>
            <rptOwnerStreet2></rptOwnerStreet2>
            <rptOwnerCity>TEJON RANCH</rptOwnerCity>
            <rptOwnerState>CA</rptOwnerState>
            <rptOwnerZipCode>93243</rptOwnerZipCode>
            <rptOwnerStateDescription></rptOwnerStateDescription>
        </reportingOwnerAddress>
        <reportingOwnerRelationship>
            <isDirector>1</isDirector>
            <isOfficer>1</isOfficer>
            <isTenPercentOwner>0</isTenPercentOwner>
            <isOther>0</isOther>
            <officerTitle>President/ CEO</officerTitle>
        </reportingOwnerRelationship>
    </reportingOwner>

    <nonDerivativeTable>
        <nonDerivativeTransaction>
            <securityTitle>
                <value>Tejon Ranch Co. Common Stock</value>
            </securityTitle>
            <transactionDate>
                <value>2016-10-06</value>
            </transactionDate>
            <deemedExecutionDate></deemedExecutionDate>
            <transactionCoding>
                <transactionFormType>4</transactionFormType>
                <transactionCode>A</transactionCode>
                <equitySwapInvolved>0</equitySwapInvolved>
            </transactionCoding>
            <transactionTimeliness>
                <value></value>
            </transactionTimeliness>
            <transactionAmounts>
                <transactionShares>
                    <value>28122</value>
                    <footnoteId id="F1"/>
                </transactionShares>
                <transactionPricePerShare>
                    <value>24.32</value>
                </transactionPricePerShare>
                <transactionAcquiredDisposedCode>
                    <value>A</value>
                </transactionAcquiredDisposedCode>
            </transactionAmounts>
            <postTransactionAmounts>
                <sharesOwnedFollowingTransaction>
                    <value>55806</value>
                    <footnoteId id="F1"/>
                </sharesOwnedFollowingTransaction>
            </postTransactionAmounts>
            <ownershipNature>
                <directOrIndirectOwnership>
                    <value>D</value>
                </directOrIndirectOwnership>
            </ownershipNature>
        </nonDerivativeTransaction>
        <nonDerivativeTransaction>
            <securityTitle>
                <value>Tejon Ranch Co. Common Stock</value>
            </securityTitle>
            <transactionDate>
                <value>2016-10-06</value>
            </transactionDate>
            <deemedExecutionDate></deemedExecutionDate>
            <transactionCoding>
                <transactionFormType>4</transactionFormType>
                <transactionCode>F</transactionCode>
                <equitySwapInvolved>0</equitySwapInvolved>
            </transactionCoding>
            <transactionTimeliness>
                <value></value>
            </transactionTimeliness>
            <transactionAmounts>
                <transactionShares>
                    <value>12753</value>
                    <footnoteId id="F1"/>
                    <footnoteId id="F2"/>
                </transactionShares>
                <transactionPricePerShare>
                    <value>24.32</value>
                </transactionPricePerShare>
                <transactionAcquiredDisposedCode>
                    <value>D</value>
                </transactionAcquiredDisposedCode>
            </transactionAmounts>
            <postTransactionAmounts>
                <sharesOwnedFollowingTransaction>
                    <value>43053</value>
                    <footnoteId id="F1"/>
                </sharesOwnedFollowingTransaction>
            </postTransactionAmounts>
            <ownershipNature>
                <directOrIndirectOwnership>
                    <value>D</value>
                </directOrIndirectOwnership>
            </ownershipNature>
        </nonDerivativeTransaction>
    </nonDerivativeTable>

    <footnotes>
        <footnote id="F1">Shares are held in the Bielli Family Trust</footnote>
        <footnote id="F2">Shares used for taxes</footnote>
    </footnotes>

    <remarks></remarks>

    <ownerSignature>
        <signatureName>/s/ Gregory S. Bielli</signatureName>
        <signatureDate>2016-10-06</signatureDate>
    </ownerSignature>
</ownershipDocument>
</XML>
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>

如果我想提取nonDerivativeTransaction

之类的内容

我通常认为会做xpath('//nonDerivativeTransaction')之类的事情,但我找不到合适的查询字符串。

如果我使用selenium并写道:

driver.find_elements_by_xpath('//nonDerivativeTransaction')

我尝试过这样的事情:

import urllib2
from lxml import etree
from lxml import html
response = urllib2.urlopen('ftp://ftp.sec.gov/edgar/data/96869/0001209191-16-145281.txt')
html_doc = response.read()
root = html.fromstring(html_doc)
tree = root.getroottree()
x=tree.xpath("nonderivativetable")

但没有运气。

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

lxml.etree不同,lxml.html将所有元素名称转换为小写。你可以通过打印根元素 - 像html.tostring(root)这样的东西 - 控制台或文件来看到这一点。也就是说,获取nonDerivativeTransaction元素的正确XPath将是:

tree.xpath("//nonderivativetransaction")

在我测试后,在问题末尾发布的代码块之后使用上面的XPath返回了2个元素。