Question

我正在尝试在数据库中抓取电子商务网站，以根据某些关键字的外观对用户进行细分。

我正在将Google Colab和Pandas库与请求库一起使用。

但是，太慢了。它将100个网站刮掉293秒。

有没有改善的方法？

这是我的代码

start = timeit.default_timer()

for url in Account["url"][:100]:
  try:
    url = "https://" + url
    page = requests.get(url)
    contents = page.content

    if len(re.findall(key4, contents)) < 1 and len(re.findall(key3, contents)) > 0:
      if len(re.findall(key1, contents)) > 50 or len(re.findall(key2, contents)) > 50:
        products_found = len(re.findall(key1, contents))
        collection_found = len(re.findall(key2, contents))
        shopping_stores_df = shopping_stores_df.append({'url': url, 'products': products_found, 'collections': collection_found}, ignore_index=True)
        shopping_stores_df.loc[shopping_stores_df['url'] == url, ['ranking', 'people', 'emails', 'tel']] = df.loc[df['Location on Site'] == url[8:], ['Alexa', 'People', 'Emails', 'Telephones']].values
  except: pass

stop = timeit.default_timer()
print('Execution time:', start-stop)

shopping_stores_df

谢谢！

通过Colab的请求功能是如此缓慢

0 个答案: