TypeError:需要一个类似字节的对象,而不是从2.7到3.6的“ str”

时间:2019-04-17 13:38:30

标签: python python-3.x csv

我是一名Python学习者,并且是stackoverflow的新手。以下代码是用Python 2.7编写的,当我尝试使用Python 3.6运行它时,出现以下错误。我读过很多关于该错误的先前文章,但仍然无法解决我的代码。请向我指出哪些行需要修复以及如何修复。

TypeError                                 Traceback (most recent call last)
<ipython-input-52-db1423a8bf7b> in <module>
     71 
     72 if __name__ == "__main__":
---> 73     main()

<ipython-input-52-db1423a8bf7b> in main()
     54     csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)
     55 
---> 56     csvWriter.writerow(["Ticker", "DocIndex","IndexLink", "Description", "FilingDate","NewFilingDate"])
     57     csvOutput.close()
     58 

TypeError: a bytes-like object is required, not 'str'
import os,sys,csv,time # "time" helps to break for the url visiting 
from bs4 import BeautifulSoup   # Need to install this package manually using pip
                                # We only import part of the Beautifulsoup4
import urllib.request
from urllib.request import urlopen

os.chdir('E:\Python\python_exercise') # The location of your file "LongCompanyList.csv"
companyListFile = "CompanyList.csv" # a csv file with the list of company ticker symbols and names (the file has a line with headers)
IndexLinksFile = "IndexLinks.csv" # a csv file (output of the current script) with the list of index links for each firm (the file has a line with headers)

def getIndexLink(tickerCode,FormType):
    csvOutput = open(IndexLinksFile,"a+b") # "a+b" indicates that we are adding lines rather than replacing lines
    csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)

    urlLink = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+tickerCode+"&type="+FormType+"&dateb=&owner=exclude&count=100"
    pageRequest = urllib.Request(urlLink)
    pageOpen = urllib.urlopen(pageRequest)
    pageRead = pageOpen.read()

    soup = BeautifulSoup(pageRead,"html.parser")

    #Check if there is a table to extract / code exists in edgar database
    try:
        table = soup.find("table", { "class" : "tableFile2" })
    except:
        print ("No tables found or no matching ticker symbol for ticker symbol for"+tickerCode)
        return -1

    docIndex = 1
    for row in table.findAll("tr"):
        cells = row.findAll("td")
        if len(cells)==5:
            if cells[0].text.strip() == FormType:
                link = cells[1].find("a",{"id": "documentsbutton"})
                docLink = "https://www.sec.gov"+link['href']
                description = cells[2].text.encode('utf8').strip() #strip take care of the space in the beginning and the end
                filingDate = cells[3].text.encode('utf8').strip()
                newfilingDate = filingDate.replace("-","_")  ### <=== Change date format from 2012-1-1 to 2012_1_1 so it can be used as part of 10-K file names
                csvWriter.writerow([tickerCode, docIndex, docLink, description, filingDate,newfilingDate])
                docIndex = docIndex + 1
    csvOutput.close()


def main():  
    FormType = "10-K"   ### <=== Type your document type here
    nbDocPause = 10 ### <=== Type your number of documents to download in one batch
    nbSecPause = 0 ### <=== Type your pausing time in seconds between each batch

    csvFile = open(companyListFile,"r") #<===open and read from a csv file with the list of company ticker symbols (the file has a line with headers)
    csvReader = csv.reader(csvFile,delimiter=",")
    csvData = list(csvReader)

    csvOutput = open(IndexLinksFile,"a+b") #<===open and write to a csv file which will include the list of index links. New rows will be appended.
    csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)

    csvWriter.writerow(["Ticker", "DocIndex","IndexLink", "Description", "FilingDate","NewFilingDate"])
    csvOutput.close()

    i = 1
    for rowData in csvData[1:]:
        ticker = rowData[0]
        getIndexLink(ticker,FormType)
        if i%nbDocPause == 0:
            print (i)
            print ("Pause for "+str(nbSecPause)+" second .... ")
            time.sleep(float(nbSecPause))
        i=i+1

    csvFile.close()
    print ("done!")

if __name__ == "__main__":
    main()

2 个答案:

答案 0 :(得分:1)

在Python 3中,您将希望使用Unicode字符串,而不是二进制(b)数据。

  • "a+b"文件打开模式更改为"a+",以获取可以在其中写入字符串的文件;它们将被转换为UTF-8(您可以使用encoding参数将其更改为open)。
  • 删除您的.encode()个电话; BeautifulSoup是Unicode字符串本机的,如上所述,一旦在文本模式下打开文件,该编码就会为您完成。

答案 1 :(得分:1)

您以二进制模式打开文件:

csvOutput = open(IndexLinksFile,"a+b") 

如果您以二进制形式打开,则需要编写二进制文件-您只在编写“普通”文本,因此最简单的解决方法可能是编写字符串:

csvOutput = open(IndexLinksFile,"a")  # simple utf text file

如果打开文件二进制文件,还需要向其中写入二进制文件-因此会出错。

仅将a+b更改为a可能会破坏代码或读取所创建文件的应用程序中的其他内容-在投入生产之前测试其是否正常工作。

您不能将2.7代码复制到3.x代码

Doku:

  

[...]第二个参数是另一个包含一些字符的字符串,这些字符描述了文件的使用方式。当仅读取文件时,模式可以为'r',仅可以写入方式为'w'(将删除具有相同名称的现有文件),并且'a'打开文件以进行附加;写入文件的所有数据都会自动添加到末尾。 'r+'打开文件以供读取和写入。 mode参数是可选的;如果省略'r'

     

通常,文件以文本模式打开,也就是说,您从文件中读取和写入字符串,这些字符串以特定的编码进行编码。如果未指定编码,则默认值取决于平台(请参见open())。附加到该模式的'b'以二进制模式打开文件:现在,数据以字节对象的形式读写。此模式应用于所有不包含文本的文件。