Question

我将其设置为“追加”，但是当它从表中提取信息时，它会为每条记录添加标题。我在这里读过多个线程，但没有找到任何有效的方法。这是我正在使用的4个网址以及我拥有的代码。

http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057804&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057805&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057806&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057807&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors

import csv
from urllib.request import urlopen
import pandas as pd

contents = []
with open('WV_urls.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents

    for url in contents:  # Parse through each url in the list.
        page = urlopen(url[0]).read()
        df, header = pd.read_html(page)
        df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')

但是，如果我自己仅使用2个url，它将打印标题并附加第二个文件。

calls_df, header = pd.read_html('http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057728&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors', header=0)
calls_df1, header = pd.read_html('http://www.wvlabor.com/new_searches/contractor_RESULTS.cfm?wvnumber=WV057729&contractor_name=&dba=&city_name=&County=&Submit3=Search+Contractors', header=0)

calls_df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='w')
calls_df1.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')

Answer 1

您错过了上面代码中的header参数，这就是为什么它在每个文件追加中都引发header。 df, header = pd.read_html(page)
如果将标头设置为 0 ，则它将提供所需的结果，但是默认情况下您会错过顶部的标头，因为默认情况下会附加到没有有任何数据



    import csv
    from urllib.request import urlopen
    import pandas as pd

    contents = []
    with open('WV_urls.csv','r') as csvf: # Open file in read mode
        urls = csv.reader(csvf)
        for url in urls:
            contents.append(url) # Add each url to list contents
        for url in contents:  # Parse through each url in the list.
            page = urlopen(url[0]).read()
            df, header = pd.read_html(page,header=0)
            df.to_csv('WV_Licenses_Daily.csv', index=False, header=None, mode='a')

您可以创建带有所需列的空数据框。之后，您可以将数据帧附加到其中。



    import csv
    from urllib.request import urlopen
    import pandas as pd

    contents = []
    df  = pd.DataFrame(columns=['WV Number', 'Company', 'DBA', 'Address', 'City', 'State', 'Zip','County', 'Phone', 'Classification*', 'Expires']) #initialize the data frame with columns
    with open('WV_urls.csv','r') as csvf: # Open file in read mode
        urls = csv.reader(csvf)
        for url in urls:
            contents.append(url) # Add each url to list contents
        for url in contents:  # Parse through each url in the list.
            page = urlopen(url[0]).read()
            df1, header = pd.read_html(page,header=0)#reading with header
            df=df.append(df1) # append to dataframe

    df.to_csv('WV_Licenses_Daily.csv', index=False)

使用大熊猫将行追加到CSV-复制标题

1 个答案: