我正在尝试使用python'读取'html文档并将输出写入excel电子表格。 HTML文件是CU的表(成本单位,由所有大写字母定义)和描述。我想将CU列在一列中,将相应的描述放在另一列中。我有一个全局存储文本的一部分,直到它到达CU然后将文本放入正确的列但由于某种原因代码不会完成所有CU的列表,它不会将描述放在正确的位置(将它们放在一个从适用的CU下来的列。任何人都可以帮我弄清楚我做错了什么?到目前为止,这是我的代码:
from HTMLParser import HTMLParser
import xlwt
global wb
global ws
global cucounter
global textcounter
global tempcu
textstore = ""
cucounter = 0
textcounter = 0
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
filename = 'C:\\Python27\\ArcGIS10.3\\Doc\\Page.html'
f = open(filename, "r").read()
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
if data.isupper():
try:
global cucounter
ws.write(cucounter, 1, data)
cucounter = cucounter + 1
wb.save('ElecTest.xls')
except UnicodeDecodeError:
pass
if data.isspace():
pass
else:
try:
global textstore
textstore += str(data)
if data.isupper():
global textstore
global textcounter
ws.write(textcounter, 2, textstore)
textcounter = textcounter + 1
textstore = ""
wb.save('ElectTest.xls')
except UnicodeDecodeError:
pass
parser = MyHTMLParser()
parser.feed(f)
遗憾的是,我无法以正确的格式添加我的HTML文件(如果我可以将UnicodeDecodeError处理有意义),但这是我可以复制的内容:
页面C / U描述: M-M
EULBPIT Excavate, backfill & tamp auger pit or primary splice hole. Qty "1" per occurrence. 4'X4'X5' pit.
EULBCOMPWHEEL Wheel Compaction - Tamping with wheel, where initial lift is rolled, trench filled & crowned and rolled again and where additional traffic is expected in location assists with tamping.
EULBCOMP85STD 85% Std. Proctor Compaction - Trench where subsidence is unsettled and probable due to nature of area, needing compaction equipment w/ 12” lifts, use in parking lots, adjacent to roadways & front lot line URD.
EULBCOMP85MOD 85% Modified Proctor Compaction - Trenches under hard surfaces of roadway, more rigid than std, requiring compaction equipment w/ maximum 12” lifts, minimum12” lift from cable, soil and moisture content critical, hand test required also.
EULBCOMP95STD 95% Std. Proctor Compaction - Used by most local jurisdictions, close to, but more than, 85% but needing more moisture, 12” lifts should be used and hand test for adequate moisture.
EULBCOMP95MOD 95% Modified Proctor Compaction - Trenches under hard surfaces of roadway, more rigid than std, requiring compaction equipment w/ maximum 12” lifts, minimum12” lift from cable, soil and moisture content critical, hand test required.
EULBCOMP Compaction Test
EULBSHORE Shoring, 5’ high, 2-sided per ft per day
EULBTHAWU Thaw master/UG work: Specify "1" in install column only. Includes install, remove, lighting, & setting (2) burners with propane tank.
EULBJACKHAMMER Jackhammer: Specify per sq ft X 4" deep. Install column only.
EULBHANDIKRETE Handikrete. Install only-Specify "1" per cu ft (1-bag).
EUCDJACK4STPIPE Jack 4" galvanized steel pipe - includes pipe & coupling. Set up and dismantle jacking equipment. Specify "1" per ft.
EUCDJACK5STPIPE Jack 5" galvanized steel pipe - includes pipe & coupling. Set up and dismantle jacking equipment. Specify "1" per ft.
EUCDJACK6STPIPE Jack 6" galvanized steel pipe - includes pipe & coupling. Set up and dismantle jacking equipment. Specify "1" per ft.
EUCDIN-OUTJACK Setting up & dismantle jacking equipment. Includes digging & filling of pits. Specify "1" per occurrence in the install column only.
EUCDCASE24 Jack 24" casing--Specify "1" per ft - does not include pipe.
EULBYSNOW Snow removal. Install column only; specify “1” for every 2 man-hours.
EULBCLEANADJUST Clean or adjust switchgear. Install only; specify “1” per occurrence.
EULBUGLC Install or remove line covers. Specify # of covers and occurrences.
EULBLOWRCBL Lowering cable - specify per linear ft. Install column only.
EULBGROUNDCBL Install, remove or test for ground on cable. Specify “1” per occurrence.
EULBMOVECBL Place terminator on stand-off or energized bushing. Specify “1” per occurrence. Install column only.
EULBPHASE-U Phase-in UG conductor. Install only; specify “1” per occurrence.
EULBTRANSRISER Transfer riser cable. Specify # of cables; install only: specify “1” per occurrence.
EULBLOCATEFAULT Find UG cable fault - Install column only; specify “1” per occurrence.
EULBCBLIDTESTER Identify cable with impulse phaser. Specify “1” per occurrence
EULBPIERCECBL Ground pierce cable - Install column only. Specify “1” per occurrence.
EULBSWITCH Switch URD 600 A PMH gear. Specify “1” per occurrence.
EULBSWOIL Switch-open & close OCR & Leads. Specify “1” per occurrence.
EULBPDLCK Padlock open and close. Specify “1” per occurrence.
EULBCOVERHOLE Plywood to cover construction hole. Specify “1’ per occurrence.
EULBSCRTYFENCNG Remove/replace/install security fencing (orange) around splice pit. Specify “1” per occurrence.
EULBDRTPKUP Dirt pick-up: Load & haul excess dirt on site, per cu yd.
EULBDRTPKPD Dirt pick-up: Load & haul excess dirt off site, per cu yd.
EULBROADBASE Road base, labor only to install; specify "1" per cu yd.
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 15">
<meta name=Originator content="Microsoft Word 15">
<link rel=File-List href="Page_files/filelist.xml">
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>John Swordy</o:Author>
<o:LastAuthor>John Swordy</o:LastAuthor>
<o:Revision>1</o:Revision>
<o:TotalTime>1</o:TotalTime>
<o:Created>2017-02-15T16:44:00Z</o:Created>
<o:LastSaved>2017-02-15T16:45:00Z</o:LastSaved>
<o:Pages>2</o:Pages>
<o:Words>600</o:Words>
<o:Characters>3426</o:Characters>
<o:Company>En Engineering</o:Company>
<o:Lines>28</o:Lines>
<o:Paragraphs>8</o:Paragraphs>
<o:CharactersWithSpaces>4018</o:CharactersWithSpaces>
<o:Version>16.00</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<link rel=themeData href="Page_files/themedata.thmx">
<link rel=colorSchemeMapping href="Page_files/colorschememapping.xml">
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
DefSemiHidden="false" DefQFormat="false" DefPriority="99"
LatentStyleCount="371">
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 3"/>
如果有人能帮助我,我将非常感谢,谢谢你的时间!注意:我是自学成才并且对python来说相对较新,所以我提前为可能不太好看的代码道歉。