我的目标是抓取一些链接并使用线程来更快地完成它。
当我尝试创建线程时,它会引发TypeError: 'int' object is not iterable
。
这是我们的脚本:
import requests
import pandas
import json
import concurrent.futures
from from collections import Iterable
# our profiles that we will scrape
profile = ['kaid_329989584305166460858587','kaid_896965538702696832878421','kaid_1016087245179855929335360','kaid_107978685698667673890057','kaid_797178279095652336786972','kaid_1071597544417993409487377','kaid_635504323514339937071278','kaid_415838303653268882671828','kaid_176050803424226087137783']
# lists of the data that we are going to fill up with each profile
total_project_votes=[]
def scraper(kaid):
data = requests.get('https://www.khanacademy.org/api/internal/user/scratchpads?casing=camel&kaid={}&sort=1&page=0&limit=40000&subject=all&lang=en&_=190425-1456-9243a2c09af3_1556290764747'.format(kaid))
sum_votes=[]
try:
data=data.json()
for item in data['scratchpads']:
try :
sum_votes=item['sumVotesIncremented']
except KeyError:
pass
sum_votes=map(int,sum_votes) # change all items of the list in integers
print(isinstance(sum_votes, Iterable)) #to check if it is an iterable element
print(isinstance(sum_votes, int)) # to check if it is a int element
sum_votes=list(sum_votes) # transform into a list
sum_votes=map(abs,sum_votes) # change all items in absolute value
sum_votes=list(sum_votes) # transform into a list
sum_votes=sum(sum_votes) # sum all items in the list
sum_votes=str(sum_votes) # transform into a string
total_project_votes=sum_votes
except json.decoder.JSONDecodeError:
total_project_votes='NA'
return total_project_votes
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
future_kaid = {executor.submit(scraper, kaid): kaid for kaid in profile}
for future in concurrent.futures.as_completed(future_kaid):
kaid = future_kaid[future]
results = future.result()
# print(results) why printing only one of them and then stops?
total_project_votes.append(results[0])
# write into a dataframe and print it:
d = {'total_project_votes':total_project_votes}
dataframe = pandas.DataFrame(data=d)
print(dataframe)
我希望得到以下输出:
total_project_votes
0 0
1 2353
2 41
3 0
4 0
5 12
6 5529
7 NA
8 2
但是我得到了这个错误:
TypeError: 'int' object is not iterable
我不太了解此错误的含义。我的脚本有什么问题?我该怎么解决?
当我查看Traceback时,问题似乎出在这里:
sum_votes=map(int,sum_votes)
。
下面是一些其他信息
跟踪:
Traceback (most recent call last):
File "toz.py", line 91, in <module>
results = future.result()
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 425, in result
return self.__get_result()
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "my_scrap.py", line 71, in scraper
sum_votes=map(int,sum_votes) # change all items of the list in integers
TypeError: 'int' object is not iterable
答案 0 :(得分:0)
我发现了我的错误:
我应该输入:
sum_votes.append(item['sumVotesIncremented'])
代替:
sum_votes=item['sumVotesIncremented']
。
此外,因为这里只有一项:total_project_votes
。我们的元组results
只有一项。
这可能会导致一些问题。因为当我们执行results[0]
时,它的行为不像列表。
它不会显示整个total_project_votes
,而是显示字符串的第一个字符。 (例如,“ Hello”变为“ H”)。
如果total_project_votes
是一个int对象而不是一个字符串。它将产生另一个错误。
要解决此问题,我需要在元组results
中添加另一个对象,然后在执行results[0]
时,它实际上就像一个列表。