该脚本几乎可以正常工作。但是,它永远不会匹配,并且当匹配时,值是不正确的。示例:
no match
Lower 117, $331.50, F, 8, 193
Upper 218, $155.00, AA, 8, 195
match
Floor 6, $273.00, N, 2, 195
SECTION,PRICE,ROW,QTY,DYSLSTED
所以我不确定为什么它不起作用。第一次加载html文件中的所有值后,该程序应仅输出match
到偶数列表,因为它们都在csv中。但是当我在当前配置中运行它时,结果相反。
HTML文件eagles.html
是here
这是我的剧本:
import os
import sys
from bs4 import BeautifulSoup
import lxml.html as lh
import csv
soup = BeautifulSoup(open("eagles.html"), "lxml")
###################################################################
variable = 'test_csv_1' ########DELETE
dir_path = os.path.dirname(os.path.realpath(__file__))
file_path = (dir_path+'\Sheets')
try:
os.makedirs(file_path)
except:
pass
#######################
for mytable in soup.find_all('table'):
for trs in mytable.find_all('tr'):
tds = trs.find_all('td')
row1 = [elem.text.strip() for elem in tds]
row = str(row1)
cool = row.replace("[", "")
coolp = cool.replace("]", "")
cool2 = coolp.replace("'", "")
cool3 = cool2.replace(" , ", "")
row = cool3
rowtest = (row.split(','))
if len(rowtest) != 5:
rowtest = ['NULL', 'NULL', 'NULL', 'NULL', 'NULL']
###TABLE STUFF###
rowtest0 = rowtest[:4] # LISTING WITHOUT DAYS LISTED
rowtest1 = rowtest[0:1] # SECTION LOCATION
rowtest2 = rowtest[1:2] # TICKET PRICE
rowtest3 = rowtest[2:3] # ROW
rowtest4 = rowtest[3:4] # TICKET QTY
rowtest5 = rowtest[4:5] # DAYS LISTED
###TABLE STUFF#
###CREATE CSV HEADER###
with open(file_path+'\\'+variable+'.csv', 'a+') as headercsv:
if os.stat(file_path+'\\'+variable+'.csv').st_size == 0:
writer = csv.writer(headercsv)
writer.writerow(["SECTION", "PRICE", "ROW", "QTY", "DYSLSTED"])
print('CREATED HEADERS FOR NEW FILE')
else:
pass
###WRITE TO CSV###
with open(file_path+'\\'+variable+'.csv', 'r') as rowin:
if rowtest == ['NULL', 'NULL', 'NULL', 'NULL', 'NULL']:
continue
else:
pass
for boogie in rowin:
if row in boogie:
print(row)
print(boogie)
print('match')
break
else:
print(row)
print(boogie)
print('no match')
with open(file_path+'\\'+variable+'.csv', 'a+') as ruts:
writer = csv.writer(ruts)
writer.writerow(rowtest)