我将其设置为“追加”,但是当它从表中提取信息时,它会为每条记录添加标题。我在这里读过多个线程,但没有找到任何有效的方法。这是我正在使用的4个网址以及我拥有的代码。
http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057804&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057805&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057806&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057807&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors
import csv
from urllib.request import urlopen
import pandas as pd
contents = []
with open('WV_urls.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
df, header = pd.read_html(page)
df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')
但是,如果我自己仅使用2个url,它将打印标题并附加第二个文件。
calls_df, header = pd.read_html('http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057728&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors', header=0)
calls_df1, header = pd.read_html('http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057729&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors', header=0)
calls_df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='w')
calls_df1.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')
答案 0 :(得分:1)
df, header = pd.read_html(page)
import csv
from urllib.request import urlopen
import pandas as pd
contents = []
with open('WV_urls.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
df, header = pd.read_html(page,header=0)
df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')
import csv
from urllib.request import urlopen
import pandas as pd
contents = []
df = pd.DataFrame(columns=['WV Number', 'Company', 'DBA', 'Address', 'City', 'State', 'Zip','County', 'Phone', 'Classification*', 'Expires']) #initialize the data frame with columns
with open('WV_urls.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents: # Parse through each url in the list.
page = urlopen(url[0]).read()
df1, header = pd.read_html(page,header=0)#reading with header
df=df.append(df1) # append to dataframe
df.to_csv('WV_Licenses_Daily.csv', index=False)