我正在尝试从EDGAR解析txt文件但是使用不同的文件类型,即使它们都是txt文件,也有不同格式的报告。我使用BeautifulSoup来解析xml报告没有问题,但是我遇到了这种类型的报告:
<SEC-DOCUMENT>0001047469-13-001017.txt : 20130214
<SEC-HEADER>0001047469-13-001017.hdr.sgml : 20130214
<ACCEPTANCE-DATETIME>20130214060031
ACCESSION NUMBER: 0001047469-13-001017
CONFORMED SUBMISSION TYPE: 13F-HR
PUBLIC DOCUMENT COUNT: 1
CONFORMED PERIOD OF REPORT: 20121231
FILED AS OF DATE: 20130214
DATE AS OF CHANGE: 20130214
EFFECTIVENESS DATE: 20130214
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME: BILL & MELINDA GATES FOUNDATION TRUST
CENTRAL INDEX KEY: 0001166559
IRS NUMBER: 911663695
STATE OF INCORPORATION: WA
FISCAL YEAR END: 1231
FILING VALUES:
FORM TYPE: 13F-HR
SEC ACT: 1934 Act
SEC FILE NUMBER: 028-10098
FILM NUMBER: 13605999
BUSINESS ADDRESS:
STREET 1: 2365 CARILLON POINT
CITY: KIRKLAND
STATE: WA
ZIP: 98033
BUSINESS PHONE: 4258897900
MAIL ADDRESS:
STREET 1: 2365 CARILLON POINT
CITY: KIRKLAND
STATE: WA
ZIP: 98033
FORMER COMPANY:
FORMER CONFORMED NAME: GATES BILL & MELINDA FOUNDATION
DATE OF NAME CHANGE: 20020205
</SEC-HEADER>
<DOCUMENT>
<TYPE>13F-HR
<SEQUENCE>1
<FILENAME>a2212666z13f-hr.txt
<DESCRIPTION>13F-HR
<TEXT>
<Page>
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
WASHINGTON, D.C. 20549
FORM 13F
FORM 13F COVER PAGE
Report for the Calendar Year or Quarter Ended: December 31, 2012
-----------------------
Check Here if Amendment / /; Amendment Number:
---------
This Amendment (Check only one.): / / is a restatement.
/ / adds new holdings entries.
Institutional Investment Manager Filing this Report:
Name: Bill & Melinda Gates Foundation Trust
-------------------------------------
Address: 2365 Carillon Point
-------------------------------------
Kirkland, WA 98033
-------------------------------------
Form 13F File Number: 28-10098
---------------------
The institutional investment manager filing this report and the person by whom
it is signed hereby represent that the person signing the report is authorized
to submit it, that all information contained herein is true, correct and
complete, and that it is understood that all required items, statements,
schedules, lists, and tables, are considered integral parts of this form.
Person Signing this Report on Behalf of Reporting Manager:
Name: Michael Larson
-------------------------------
Title: Authorized Agent
-------------------------------
Phone: (425) 889-7900
-------------------------------
Signature, Place, and Date of Signing:
/s/ Michael Larson Kirkland, Washington February 14, 2013
------------------------------- -------------------- -----------------
[Signature] [City, State] [Date]
Report Type (Check only one.):
/X/ 13F HOLDINGS REPORT. (Check here if all holdings of this reporting
manager are reported in this report.)
/ / 13F NOTICE. (Check here if no holdings reported are in this report,
and all holdings are reported by other reporting manager(s).)
/ / 13F COMBINATION REPORT. (Check here if a portion of the holdings for this
reporting manager are reported in this report and a portion are reported by
other reporting manager(s).)
<Page>
FORM 13F SUMMARY PAGE
Report Summary:
Number of Other Included Managers: 0
--------------------
Form 13F Information Table Entry Total: 26
--------------------
Form 13F Information Table Value Total: $ 16,788,719
--------------------
(thousands)
List of Other Included Managers:
Provide a numbered list of the name(s) and Form 13F file number(s) of all
institutional investment managers with respect to which this report is filed,
other than the manager filing this report.
NONE
2
<Page>
FORM 13 INFORMATION TABLE
As of December 31, 2012
<Table>
<Caption>
VOTING AUTHORITY
VALUE SHRS OR SH/ PUT/ INVESTMENT OTHER ----------------------
NAME OF ISSUER TITLE OF CLASS CUSIP (x$1000) PRN AMOUNT PRN CALL DISCRETION MANAGERS SOLE SHARED NONE
---------------------------- ---------------- --------- ---------- ------------ --- ---- ---------- -------- ---------- ------ ----
<S> <C> <C> <C> <C> <C> <C> <C> <C> <C> <C> <C>
AUTOLIV INC COM 052800109 8,329 123,600 SH SOLE 123,600
AUTONATION INC COM 05329W102 75,379 1,898,716 SH SOLE 1,898,716
BERKSHIRE HATHAWAY INC DEL CL B NEW 084670702 7,811,199 87,081,373 SH SOLE 87,081,373
BP PLC SPONSORED ADR 055622104 297,018 7,133,000 SH SOLE 7,133,000
CANADIAN NATL RY CO COM 136375102 779,358 8,563,437 SH SOLE 8,563,437
CATERPILLAR INC DEL COM 149123101 919,168 10,260,857 SH SOLE 10,260,857
COCA COLA CO COM 191216100 1,232,573 34,002,000 SH SOLE 34,002,000
COCA COLA FEMSA SAB DE CV SPON ADR REP L 191241108 926,242 6,214,719 SH SOLE 6,214,719
CROWN CASTLE INTL CORP COM 228227104 384,822 5,332,900 SH SOLE 5,332,900
DIAMOND FOODS INC COM 252603105 6,031 441,163 SH SOLE 441,163
ECOLAB INC COM 278865100 313,946 4,366,425 SH SOLE 4,366,425
EXXON MOBIL CORP COM 30231G102 661,576 7,643,858 SH SOLE 7,643,858
FEDEX CORP COM 31428X106 277,453 3,024,999 SH SOLE 3,024,999
FOMENTO ECONOMICO MEXICANO SPON ADR UNITS 344419106 21,953 218,000 SH SOLE 218,000
GRUPO TELEVISA SA SPON ADR REP ORD 40049J206 448,647 16,879,103 SH SOLE 16,879,103
LIBERTY GLOBAL INC COM SER A 530555101 133,508 2,119,515 SH SOLE 2,119,515
LIBERTY GLOBAL INC COM SER C 530555309 41,507 706,507 SH SOLE 706,507
MCDONALDS CORP COM 580135101 870,853 9,872,500 SH SOLE 9,872,500
ORBOTECH LTD ORD M75253100 6,973 823,300 SH SOLE 823,300
PROCTER & GAMBLE CO COM 742718109 101,835 1,500,000 SH SOLE 1,500,000
REPUBLIC SVCS INC COM 760759100 39,596 1,350,000 SH SOLE 1,350,000
SIGNET JEWELERS LIMITED SHS G81276100 9,993 187,130 SH SOLE 187,130
TOYOTA MOTOR CORP SP ADR REP2COM 892331307 14,295 153,300 SH SOLE 153,300
WAL-MART STORES INC COM 931142103 757,558 11,103,000 SH SOLE 11,103,000
WASTE MGMT INC COM 94106L109 628,700 18,633,672 SH SOLE 18,633,672
WILLIS GROUP HOLDINGS PUBLIC SHS G96666105 20,209 602,700 SH SOLE 602,700
---------- ------------
16,788,719 240,235,774
</Table>
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>
正如您所看到的,此文件只是带有自定义标记的纯文本文件。
我的问题是:我如何定位特定标签内的文字?例如,我只需要上面的txt文件中TEXT标签内的文本。
答案 0 :(得分:0)
您可以选择文本标签,然后处理该内容:
soup = BeautifulSoup(open("/yourfile.html"), "html.parser")
text_tags = soup.find('text')
for text in text_tags:
print text
# work from here
注意:我使用了html.parser,它已经返回了文本标记。如果更适合您的需要,您可能需要更改为xml解析器