我要提取表网址为https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia 我的代码没有提供数据。如何获取?
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',class_="wikitable sortable jquery-tablesorter")
print(ta)
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',{'class':'wikitable'})
print(ta)
您可以使用旧方法按类名搜索表。似乎仍然有效。
答案 1 :(得分:0)
修复:
URL
代替url
(第4行)wikitable
因此:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia")
soup = BeautifulSoup(page.content, 'html.parser')
ta = soup.find_all('table',class_="wikitable")
print(ta)
输出:
[<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Image
</th>
<th>Name
</th>
<th>2016 Revenues (USD $M)
</th>
<th>Employees
</th>
<th>Notes
.
.
.
答案 2 :(得分:0)
也许这不是您想要的。但是您可以尝试这个。
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
for data in soup.find_all('table', {"class":"wikitable"}):
for td in data.find_all('td'):
for link in td.find_all('a'):
print (link.text)
答案 3 :(得分:0)
尝试以下
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(URL).text
soup = bs(html, 'html.parser')
ta=soup.find("table",{"class":"wikitable sortable"})
print(ta)
获取所有表格
ta=soup.find_all("table",{"class":"wikitable sortable"})
答案 4 :(得分:0)
如果要解析表数据,则可以使用pandas
来完成,如果要操作表数据,效率很高,则可以使用熊猫DataFrame()
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url,header=0)
print(table[1])
答案 5 :(得分:0)
如果我要拉一张桌子并看到<table>
标签,我总是会先尝试熊猫.read_html()
。它将为您遍历行。大多数时候,您可以确切地获得所需的内容,或者至少只需对数据帧进行一些小的操作即可。在这种情况下,它可以很好地为您提供完整的表格:
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url)[1]
输出:
print (table.to_string())
0 1 2 3 4 5
0 Name Industry Sector Headquarters Founded Notes
1 Airfast Indonesia Consumer services Airlines Tangerang 1971 Private airline
2 Angkasa Pura Industrials Transportation services Jakarta 1962 State-owned airports
3 Astra International Conglomerates - Jakarta 1957 Automotive, financials, industrials, technology
4 Bank Central Asia Financials Banks Jakarta 1957 Bank
5 Bank Danamon Financials Banks Jakarta 1956 Bank
6 Bank Mandiri Financials Banks Jakarta 1998 Bank
7 Bank Negara Indonesia Financials Banks Jakarta 1946 Bank
8 Bank Rakyat Indonesia Financials Banks Jakarta 1895 Micro-finance bank
9 Bumi Resources Basic materials General mining Jakarta 1973 Mining
10 Djarum Consumer goods Tobacco Kudus and Jakarta 1951 Tobacco
11 Dragon Computer & Communication Technology Computer hardware Jakarta 1980 Computer hardware
12 Elex Media Komputindo Consumer services Publishing Jakarta 1985 Publisher
13 Femina Consumer services Media Jakarta 1972 Weekly magazine
14 Garuda Indonesia Consumer services Travel & leisure Tangerang 1949 State-owned airline
15 Gudang Garam Consumer goods Tobacco Kediri 1958 Tobacco
16 Gunung Agung Consumer services Specialty retailers Jakarta 1953 Bookstores
17 Indocement Tunggal Prakarsa Industrials Building materials & fixtures Jakarta 1985 Cement, part of HeidelbergCement (Germany)
18 Indofood Consumer goods Food products Jakarta 1968 Food production
19 Indonesian Aerospace Industrials Aerospace Bandung 1976 State-owned aircraft design
20 Indonesian Bureau of Logistics Consumer goods Food products Jakarta 1967 Food distribution
21 Indosat Telecommunications Fixed line telecommunications Jakarta 1967 Telecommunications network
22 Infomedia Nusantara Consumer services Publishing Jakarta 1975 Directory publisher
23 Jalur Nugraha Ekakurir (JNE) Industrials Delivery services Jakarta 1990 Express logistics
24 Kalbe Farma Health care Pharmaceuticals Jakarta 1966 Pharmaceuticals
25 Kereta Api Indonesia Industrials Railroads Bandung 1945 State-owned railway
26 Kimia Farma Health care Pharmaceuticals Jakarta 1971 State-owned pharma
27 Kompas Gramedia Group Consumer services Media agencies Jakarta 1965 Media holding
28 Krakatau Steel Basic materials Iron & steel Cilegon 1970 State-owned steel
29 Lion Air Consumer services Airlines Jakarta 2000 Low-cost airline
30 Lippo Group Financials Real estate holding & development Jakarta 1950 Development
31 Matahari Consumer services Broadline retailers Tangerang 1982 Department stores
32 MedcoEnergi Oil & gas Exploration & production Jakarta 1980 Energy, oil and gas
33 Media Nusantara Citra Consumer services Broadcasting & entertainment Jakarta 1997 Media
34 Panin Sekuritas Financials Investment services Jakarta 1989 Broker
35 Pegadaian Financials Consumer finance Jakarta 1901 State-owned financial services
36 Pelni Industrials Marine transportation Jakarta 1952 Shipping
37 Pos Indonesia Industrials Delivery services Bandung 1995 State-owned postal service
38 Pertamina Oil & gas Integrated oil & gas Jakarta 1957 State-owned oil and natural gas
39 Perusahaan Gas Negara Oil & gas Exploration & production Jakarta 1965 Gas
40 Perusahaan Gas Negara Utilities Gas distribution Jakarta 1965 State-owned natural gas transportation
41 Perusahaan Listrik Negara Utilities Conventional electricity Jakarta 1945 State-owned electrical distribution
42 Phillip Securities Indonesia, PT Financials Investment services Jakarta 1989 Financial services
43 Pindad Industrials Defense Bandung 1808 State-owned defense
44 PT Lapindo Brantas Oil & gas Exploration & production Jakarta 1996 Oil and gas
45 PT Metro Supermarket Realty Tbk Consumer services Food retailers & wholesalers Jakarta 1955 Supermarkets
46 Salim Group Conglomerates - Jakarta 1972 Industrials, financials, consumer goods
47 Sampoerna Consumer goods Tobacco Surabaya 1913 Tobacco
48 Semen Indonesia Industrials Building materials & fixtures Gresik 1957 Cement
49 Susi Air Consumer services Airlines Pangandaran 2004 Charter airline
50 Telkom Indonesia Telecommunications Fixed line telecommunications Bandung 1856 Telecommunication services
51 Telkomsel Telecommunications Mobile telecommunications Jakarta 1995 Mobile network, part of Telkom Indonesia
52 Trans Corp Conglomerates - Jakarta 2006 Media, consumer services, real estate, part of...
53 Unilever Indonesia Consumer goods Personal products Jakarta 1933 Personal care products, part of Unilever (Neth...
54 United Tractors Industrials Commercial vehicles & trucks Jakarta 1972 Heavy equipment
55 Waskita Industrials Heavy construction Jakarta 1961 State-owned construction