我正在使用selenium和python进行抓取。 Python2.7 OS- Mac 10.14 Mojave
错误是-未引用的字段不允许\ r或\ n(第2行)。
这意味着换行问题。我是这样做的:
des = driver.find_element_by_xpath('//*[@id="descriptiontext"]/div/div/div')
.get_attribute('innerHTML')
.encode('ascii', 'ignore')
.decode('ascii')
regex=re.compile('<a.*?a>') #taking out <a> tags
des1 = str(re.findall(regex, des)[0])
des = des.replace(des1,'')
with open("new.csv", "a") as myfile:
myfilewriter = csv.writer(myfile)
if len(menuList)==2:
print (des)
type,tags=blank,blank
published='TRUE'
option1 = menuList.items()[0][0]
option1Val= menuList.items()[0][1][0].items()[0][0]
option2 = menuList.items()[1][0]
option2Val=menuList.items()[1][1][0].items()[0][0]
option3, option3Val= blank,blank
sku = directory
gram = '0'
v_inventory='shopify'
v_inventory_quantity= '100'
v_inventory_policy= 'continue'
v_fulfillment_service='manual'
try:
v_price = float(menuList.items()[1][1][0].items()[0][1]) + 10.99
except:
v_price = 10.99 + price
v_compare_price = blank
v_shipping= 'TRUE'
v_taxable= 'FALSE'
v_barcode = blank
v_imgsrc = blank
img_pos = blank
img_alt = blank
giftCard= 'FALSE'
seo_title,seo_des,gShopping,gSG,gSA,gMPN,gAd,gAdL,gSC,gCP,gSCL,gSCL1,gSCL2,gSCL3,gSCL4,v_image= blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank,blank
v_weight_unit ='lb'
v_tax, cpi=blank,blank
myfilewriter.writerow([handle,title,des,vendor,type,tags,published,option1,option1Val,option2,option2Val,option3,option3Val,sku,gram,v_inventory,v_inventory_quantity,
v_inventory_policy,v_fulfillment_service,v_price,v_compare_price,v_shipping,v_taxable,
v_barcode,v_imgsrc,img_pos,img_alt,giftCard,seo_title,seo_des,gShopping,gSG,gSA,gMPN,gAd,gAdL,gSC,gCP,gSCL,gSCL1,gSCL2,gSCL3,gSCL4,v_image,
v_weight_unit,v_tax,cpi])
不确定我还能做什么。 “ des”的输出是产品说明的html。因此,显示“ des”的csv版本会呈现html本身。
请留下您的电子邮件ID,我很乐意发送csv文件和抓取代码。
答案 0 :(得分:0)
更新:: 哦,我的上帝!从字面上看,这浪费了2天。问题是由于Mac。当使用Windows逗号分隔值(.csv)时,它可以工作!....疯了...谁能启发我,有什么区别?