我正在使用.NET连接器在雪花中执行更新操作。对于并行执行更新操作,我使用的是import urllib.request,sys,time
from bs4 import BeautifulSoup
import requests
import pandas as pd
pagesToGet= 1
upperframe=[]
for page in range(1,pagesToGet+1):
print('processing page :', page)
url = 'https://ingatlan.com/lista/elado+lakas+ii-ker?page='+str(page)
print(url)
try:
page=requests.get(url)
except Exception as e:
error_type, error_obj, error_info = sys.exc_info()
print ('ERROR FOR LINK:',url)
print (error_type, 'Line:', error_info.tb_lineno)
continue
time.sleep(2)
soup=BeautifulSoup(page.text,'html.parser')
frame=[]
links=soup.find_all('div',attrs={'class':'listing js-listing '})
print(len(links))
filename="NEWS.csv"
f=open(filename,"w", encoding = 'utf-8')
headers="Price\n"
f.write(headers)
for j in links:
Price = j.find("div",attrs={'class':'price'})
frame.append((Price))
upperframe.extend(frame)
f.close()
data=pd.DataFrame(upperframe, columns=['Price'])
data.head()
,但它抛出“ SQL执行已取消”
以下是代码
Parallel.Foreach()
运行代码后,登录雪花显示“ SQL执行已取消”。
如果我在简单的for循环中执行更新,则永远需要更新一百万行。还有其他方法可以执行此任务吗?