我成功抓取了该网站,并且正确抓取了数据。唯一的问题是导出到csv。我用熊猫导出数据,结果变得混乱不堪。下面是我的代码:
while next_page is not None:
results_element = driver.find_elements_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", '
'"label-primary", " " ))]')
results = [x.text for x in results_element]
print(results)
driver.implicitly_wait(5)
ASIN_element = driver.find_elements_by_xpath(
'//*[contains(concat( " ", @class, " " ), concat( " ", "asin-column", '
'" " ))]//a')
ASIN = [x.text for x in ASIN_element]
print(ASIN)
driver.implicitly_wait(5)
Title_element = driver.find_elements_by_css_selector('.asin-column+ td')
Title = [x.text for x in Title_element]
print(Title)
driver.implicitly_wait(5)
Date_element = driver.find_elements_by_css_selector(
'.format-date'), 10
Date = [x for x in Date_element]
print(Date)
driver.implicitly_wait(5)
df = pd.DataFrame(list(zip(results, ASIN, Title, Date)), columns=['results', 'ASIN', 'Product_Title', 'Date'])
beach_balls_data = df.to_csv(f, index=False)
if next_page is not None:
driver.find_element_by_css_selector('.next a').click()
driver.implicitly_wait(5)
elif next_page is None:
iterate = False
driver.implicitly_wait(5)
time.sleep(5)
我只需要正确导出数据而不会覆盖任何内容。任何帮助将不胜感激。
答案 0 :(得分:0)
下面(不使用熊猫或任何其他库)
# assuming the scraping output is the 4 lists below
results = ['r1', 'r2', 'r3']
asin_lst = ['asin1', 'asin2', 'asin3']
title_lst = ['t1', 't2', 't3']
date_lst = ['d1', 'd2', 'd3']
with open('out.csv','w') as f:
f.write('result,asin,title,date\n')
for entry in list(zip(results,asin_lst,title_lst,date_lst)):
f.write(','.join(list(entry)) + '\n')
输出('out.csv')
result,asin,title,date
r1,asin1,t1,d1
r2,asin2,t2,d2
r3,asin3,t3,d3