我正在尝试通过网页抓取从多个页面收集数据。问题是我想将列转置为行,以将抓取的数据作为DataFrame获得。
我检查了此question并将其应用于我的python代码,但是无法正常工作。
下面是我的代码:
browser.get('https://fortune.com/global500/2019/walmart')
data =[]
i = 1
while True:
table = browser.find_element_by_css_selector('tbody')
if i > 2:
break
try:
print("Scraping Page no. " + str(i))
i = i + 1
for row in table.find_elements_by_css_selector('tr'):
cols = [cell.text for cell in row.find_elements_by_css_selector('td.dataTable__value--3n5tL.dataTable__valueAlignLeft--3uvNx')]
colsT = data.append(np.array(cols).T.tolist())
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a > span.singlePagination__icon--2KbZn"))).click()
time.sleep(3)
except TimeoutException:
break
except Exception as e:
print(e)
break
data1 = pd.DataFrame(data)
print(data1)
这是我运行的代码的输出:
Scraping Page no. 1
Scraping Page no. 2
0
0 C. Douglas McMillon
1 Retailing
2 General Merchandisers
3 Bentonville, Ark.
4 -
5 25
6 2,200,000
7 Dai Houliang
8 Energy
9 Petroleum Refining
10 Beijing
11 -
12 21
13 619,151
这就是我想要的样子:
0 C. Douglas McMillon Retailing General Merchandisers Bentonville, Ark. - ...
1 Dai Houliang Energy Petroleum Refining Beijing - ...
任何建议或修正都将在这里得到赞赏。
答案 0 :(得分:1)
您可以将值列表直接作为一行添加到数据框。我设置了具体的列,并将列表添加到与这些列匹配的数据框中。
browser.get('https://fortune.com/global500/2019/walmart')
data =[]
df = pd.DataFrame(columns = ['c1', 'c2', 'c3', 'c4', 'c5','c6','c7'])
i = 1
while True:
table = browser.find_element_by_css_selector('tbody')
if i > 2:
break
try:
print("Scraping Page no. " + str(i))
i = i + 1
values =[]
for row in table.find_elements_by_css_selector('tr'):
value = ([cell.text for cell in row.find_elements_by_css_selector('td.dataTable__value--3n5tL.dataTable__valueAlignLeft--3uvNx')])
values.append(value)
print(values)
s = pd.Series(values,index=df.columns)
df = df.append(s,ignore_index=True)
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a > span.singlePagination__icon--2KbZn"))).click()
time.sleep(3)
except TimeoutException:
break
except Exception as e:
print(e)
break
print(df)
browser.quit()
输出:
c1 c2 ... c6 c7
0 [C. Douglas McMillon] [Retailing] ... [25] [2,200,000]
1 [Dai Houliang] [Energy] ... [21] [619,151]
答案 1 :(得分:0)
您只可以使用熊猫transpose function:
df_transposed = data1.T
输出:
0 C. Douglas McMillon Retailing General Merchandisers Bentonville, Ark. - ...
1 Dai Houliang Energy Petroleum Refining Beijing - ...