检查元素并使用XPATH获取正确的数据python

时间:2016-01-07 16:20:55

标签: python html xpath web-scraping lxml

我正在尝试抓取所有coinID form this website

在检查元素时,ID's are seen here,但是当使用Xpath进行复制时,我得到:

//*[@id="id-bitcoin"]

我打算使用这个python代码:

from lxml import html 
import requests 

page = requests.get('http://coinmarketcap.com/all/views/all/')
tree = html.fromstring(page.content)

ID = tree.xpath('')

print ID

但是我不确定在要插入tree.xpath('')的元素中要查找什么

我希望像

这样的东西
//span[@class="id"]/text()

我尝试打印树以更好地理解数据,但它打印出来的是`查看数据的语法是什么,比如tree.getdata()?

有关如何获取这些硬币ID名称的任何信息都将非常感谢,谢谢。

3 个答案:

答案 0 :(得分:1)

我想您正在尝试获取id trfrom lxml import html import requests page = requests.get('http://coinmarketcap.com/all/views/all/') tree = html.fromstring(page.content) trs = tree.xpath('//table[@id="currencies-all"]/tbody/tr') for tr in trs: print tr.attrib.get('id') 标签。由于id是标记的属性,您可以这样得到它:

id-bitcoin
id-ripple
id-litecoin
id-ethereum
id-dash
id-dogecoin
...

您将获得如下输出:

tr

如果您想从每行td获取数据,可以在tr中找到每个for tr in trs: tds = tr.findall('td') data = [td.text_content().strip() for td in tds] print data 标记并提取文字内容。

['1', 'Bitcoin', 'BTC', '$ 6,815,160,833', '$ 452.70', '15,054,475', '$ 75,535,400', '-0.21 %', '5.19 %', '5.76 %']
...

输出:

CREATE TABLE TEST
    ([Name] varchar(50), [StartDateTime1] datetime, [EndDateTime2] datetime, Diffy int)
;

INSERT INTO Test
    ([Name], [StartDateTime], [EndDateTime2], [Diffy])
VALUES
    ('ABC', '2015-07-21 16:08:02.000', '2015-07-21 16:18:10.000', '608' ),
    ('ABC', '2015-07-21 16:18:10.000', '2015-07-21 23:06:46.000', '24516' ),
    ('ABC', '2015-07-21 16:18:10.000', '2015-07-23 12:37:35.000', '159565' ),
    ('ABC', '2015-07-23 17:33:35.000', '2015-07-24 11:07:00.000', '63205' )
;


╔══════╦════════╗
║ Name ║ Diffy  ║
╠══════╬════════╣
║ ABC  ║    608 ║
║ ABC  ║ 159565 ║
║ ABC  ║  63205 ║
╚══════╩════════╝

您可能需要清理数据。

答案 1 :(得分:0)

您几乎就在那里 - 您需要使用XPath表达式来选择所有货币名称,然后将它们存储在适当的变量中(我使用列表):

from lxml import html
import requests

page = requests.get('http://coinmarketcap.com/all/views/all/')
tree = html.fromstring(page.content)

currencies = [curr for curr in tree.xpath('//td[contains(@class, "currency-name")]/a/text()')]
print(currencies)

<强>输出

['Bitcoin', 'Ripple', 'Litecoin', 'Ethereum', 'Dash',....]

(我已截断输出)。

XPath表达式//td[contains(@class, "currency-name")]/a/text()查找具有类td的所有currency-name元素,然后从td元素的anchor元素子元素返回文本。 / p>

答案 2 :(得分:0)

from lxml import html
import requests

page = requests.get('http://coinmarketcap.com/all/views/all/')
tree = html.fromstring(page.content)
ID = tree.xpath('//img[@class="currency-logo"]/@alt')
print ID

输出:

['Bitcoin', 'Ripple', 'Litecoin', 'Ethereum', 'Dash', 'Dogecoin', 'Peercoin', 'Stellar', 'BitShares', 'MaidSafeCoin', 'Nxt', 'Bytecoin', 'Namecoin', 'Monero', 'Factom', 'EmerCoin', 'GridCoin', 'NuShares', 'Rubycoin', 'NEM', 'MonaCoin', 'Clams', 'BlackCoin', 'YbCoin', 'Startcoin', 'Counterparty', 'Tether', 'BitcoinDark', 'Bitcrystals', 'Synereo', 'Global Currency Reserve', 'Mastercoin (Omni)', 'Novacoin', 'GetGems', 'MUSE', 'AmberCoin', 'PayCoin', 'VeriCoin', 'Rimbit', 'CasinoCoin', 'Primecoin', 'DigiByte', 'I0Coin', 'Storjcoin X', 'Megacoin', 'ShadowCash', 'NeuCoin', 'Mintcoin', 'VPNCoin', 'Quark', 'ReddCoin', 'SuperNET', 'WorldCoin', 'SolarCoin', 'GameCredits', 'FuelCoin', 'DNotes', 'NautilusCoin', 'BoostCoin', 'Vanillacoin', 'EarthCoin', 'DigitalNote', 'Infinitecoin', 'Scotcoin', 'Diamond', 'Gulden', 'Vertcoin', 'ARCHcoin', 'Crypti', 'Feathercoin', 'FedoraCoin', 'Applecoin', 'InstantDEX', 'Electronic Gulden', 'Auroracoin', 'Unobtanium', 'BilShares', 'Zetacoin', 'Digitalcoin', 'Anoncoin', 'FairCoin', 'CureCoin', 'I/O Coin', 'AsiaCoin', 'Obits', 'NetCoin', 'Swarm', 'SysCoin', 'MaxCoin', 'Virtacoin', 'UnionCoin', 'Flycoin', 'Riecoin', 'Crypto Bullion', 'Horizon', 'SpreadCoin', 'EuropeCoin', 'Siacoin', 'CloakCoin', 'LIQUID', 'BitBay', 'TileCoin', 'TEKcoin', 'ZcCoin', 'Jinn', 'Qora', 'TagCoin', 'HyperStake', 'Navajo', 'Aeon', 'CannabisCoin', 'Xaurum', 'DogeCoinDark', 'PotCoin', 'GoldCoin', 'jl777hodl', 'Bytecent', 'SmileyCoin', 'StabilityShares', 'HoboNickels', 'bitUSD', 'XCurrency', 'NXTventure', 'Burst', 'Orbitcoin', 'Zeitcoin', 'Devcoin', 'Quatloo', 'Tickets', 'Jumbucks', 'Cannacoin', 'Memorycoin', 'AudioCoin', 'CORE', 'TrustPlus', 'Viacoin', 'OrangeCoin', 'Woodcoin', 'Stealthcoin', 'BitSwift', 'Coinomat', 'Silkcoin', 'bitCNY', 'FlorinCoin', 'MazaCoin', 'Canada eCoin', 'Xiaomicoin', 'Cryptofund', 'Joincoin', 'Boolberry', 'GeoCoin', 'VootCoin', 'Qibuck', 'Mooncoin', 'Energycoin', 'SecureCoin', 'Node', 'WhiteCoin', 'OKCash', 'GroestlCoin', 'ArtByte', 'BitShares PTS', 'BitBean', 'Hyper', 'Capricoin', 'CoinoIndex', 'RedCoin', 'WildBeastBitcoin', 'SIBCoin', 'BitStone', 'Terracoin', 'Myriadcoin', 'CryptoEscudo', 'TransferCoin', 'RibbitRewards', 'Truckcoin', 'Expanse', 'Opal', 'OpenBTC', 'Blitz', 'MediterraneanCoin', 'LTBcoin', 'Magi', 'Cryptonite', 'DigiCube', 'Bitmark', 'NobleCoin', 'Steps', 'Synergy', 'Gambit', 'Sprouts', 'SecretCoin', 'ZiftrCOIN', 'Titcoin', 'Bitcredits', 'FoldingCoin', 'UFO Coin', 'SOILcoin', 'bitBTC', 'Adzcoin', 'CryptCoin', 'DopeCoin', 'Sling', 'Nyancoin', 'Karmacoin', 'GenesysCoin', 'MangoCoinz', 'AmsterdamCoin', 'PopularCoin', 'Einsteinium', 'LimeCoinX', 'Influxcoin', 'Prime-XI', 'Bata', '8Bit', 'PayCon', 'HunterCoin', 'ExclusiveCoin', 'Bitz', 'ReturnCoin', 'QuazarCoin', 'Blakecoin', 'Grantcoin', 'NeosCoin', 'GCoin', 'Genstake', 'bitGold', 'MonetaryUnit', 'PrimeChain', 'Axiom', 'Neutron', 'KhanCoin', 'bitSilver', 'Sapience AIFX', 'SwagBucks', 'CrownCoin', 'AntiBitcoin', 'Droidz', 'Bitzeny', 'SongCoin', 'Bantam', 'MindCoin', 'BREAKcoin', 'CryptoCircuits', 'Photon', 'Cryptographic Anomaly', 'MasterTraderCoin', 'Swing', 'Datacoin', 'GraniteCoin', 'SydPak', 'Guncoin', 'IvugeoCoin', 'Floz', 'TAGRcoin', 'IslaCoin', 'Cerium', 'UCoin', 'Unitus', 'Alexium', 'FreedomCoin', 'bitEUR', 'World Trade Funds', 'GamerholicCoin', 'RhinoCoin', '1337', 'TRMB', 'Ixcoin', 'CoinoUSD', 'BlockShares', 'SolarFarm', 'SkyNET', 'Nas', 'Pangea Poker', 'FIMKrypto', 'sharkfund0', 'CzechCrownCoin', 'Dimecoin', 'Blocknet', 'Colossuscoin V2', 'FreeMarket', 'MMNXT', 'Deutsche eMark', 'Bitstar', 'Carboncoin', 'PinkCoin', 'The Viral Exchange', 'Bottlecaps', 'Freicoin', 'Dogeparty', 'Nexus', 'Privatebet', 'NXTprivacy', 'Nxttycoin', 'Sexcoin', 'LiteDoge', 'Librexcoin', 'CryptoBuck', 'Sonic', 'NobleNXT', 'USDe', 'MMBTCD', 'CarpeDiemCoin', 'Woodshares', 'Ratecoin', 'Extremecoin', 'Yacoin', 'UltraCoin', 'FlutterCoin', 'Colossuscoin', 'BitBar', 'Sync', 'Buongiorno Caffe', '42 Coin', 'Chancecoin', 'Pandacoin', 'microCoin', 'DeBuNe', 'NeoDICE', 'LottoCoin', 'MaryJane', 'Trollcoin', 'FlappyCoin', 'HTMLCOIN', 'Viral', 'Dashcoin', 'Fibre', 'ContinuumCoin', 'MGW', 'BattleCoin', 'Sembro Token', 'RabbitCoin', 'ECCoin', 'Coin2.1', 'NoirShares', 'KoreCoin', 'CommunityCoin', 'BBQCoin', 'Philosopher Stones', 'Fastcoin', 'Pesetacoin', 'TeslaCoin', 'Sterlingcoin', 'SuperCoin', 'Piggycoin', 'TittieCoin', 'Particle', 'ApexCoin', 'IncaKoin', 'BitcoinTX', 'Emerald Crypto', 'MetalCoin', 'KeyCoin', 'Triangles', 'GlobalCoin', 'Fantomcoin', 'Uro', 'Mineralscoin', 'ParkByte', 'SmartCoin', 'NXTInspect', 'Franko', 'LiteBar', 'Jay', 'BlueCoin', 'Ringo', 'Sphere', 'Marscoin', 'Kobocoin', 'UnbreakableCoin', 'Kittehcoin', 'Aricoin', 'ClearingHouse', 'SHACoin', 'FreshCoin', 'GAIA', 'PLNcoin', '020LondonCoin', 'GrandCoin', 'BunnyCoin', 'Murraycoin', 'Animecoin', 'GlobalBoost-Y', 'Bitcoin Plus', 'Argentum', 'Nakas', 'Neutrino', 'CoolCoin', 'SatoshiMadness', 'Elacoin', 'LeafCoin', 'Heavycoin', 'Fractalcoin', 'DayTraderCoin', 'XxXcoin', 'HempCoin', 'SPEC', 'AsicCoin', 'LitecoinDark', 'Lightspeed', 'SaffronCoin', 'MultiWalletCoin', 'Helleniccoin', 'Sativacoin', 'HamRadioCoin', 'NewYorkCoin', 'FujiCoin', 'Electron', 'DeltaCredits', 'Moin', 'Libertycoin', 'Moneta', 'AeroMe', 'Tigercoin', 'Pakcoin', 'Quicksilver', 'Phoenixcoin', 'GoldPieces', 'Luckycoin', 'X-Coin', 'ChipCoin', 'CageCoin', 'Crave', 'SpainCoin', 'CorgiCoin', 'Krugercoin', 'Copperlark', 'Quotient', 'Bitgem', 'Razor', 'StrongHands', 'Aiden', 'GiveCoin', 'KlondikeCoin', 'IcebergCoin', 'CAPTcoin', 'Saturn2Coin', 'Bitcoin Scrypt', 'Positron', 'DarkCash', 'TorCoin', 'iCash', 'GoldReserve', 'RonPaulCoin', 'Viorcoin', 'Spots', 'MonetaVerde', 'RussiaCoin', 'Vcoin', 'CacheCoin', 'GreenBacks', 'StableCoin', 'BetaCoin', 'CraigsCoin', 'RosCoin', 'Joulecoin', 'TurboStake', 'Mincoin', 'Cypher', 'DarkShibe', 'Catcoin', 'Halcyon', 'Guerillacoin', 'Acoin', 'Checkcoin', 'LimitedCoin', 'Zedcoin', 'PetroDollar', 'Greencoin', 'ShieldCoin', 'Doubloons', 'Money', 'Cashcoin', 'Lycancoin', 'CandyCoin', 'ZimStake', 'FireFlyCoin', 'BellaCoin', 'Benjamins', 'Aliencoin', 'Conspiracycoin', 'Execoin', 'RotoCoin', 'CrackCoin', 'Gapcoin', 'Judgecoin', 'BeaverCoin', 'Lyrabar', 'Solecoin', 'Kumacoin', 'Glyph', 'BatCoin', 'SoonCoin', 'PreminePlus', 'Munne', 'Coven', 'Full Integrity Coin', 'MapCoin', 'TopCoin', 'ConcealCoin', 'Guarany', 'Universal Currency', 'Junkcoin', 'Umbrella-LTC', 'XCash', 'UtilityCoin', 'Bloodcoin', 'PseudoCash', 'ShadeCoin', 'Nimbus', 'OpenSourcecoin', 'Dibbits', 'Axron', 'CAIx', 'BitCrystal', '007Coin', 'Vidio', 'BitQuark', 'MazeCoin', 'Heisenberg', 'Coinaid', 'RipoffCoin', 'Quarkbar', 'AnarchistsPrime', 'ARbit', 'Hirocoin', 'DarkTron', 'Isracoin', 'Metal Music Coin', 'Dobbscoin', 'TakCoin', 'BitStake', 'Graffiti', 'CRTCoin', 'Paycoin', 'BowsCoin', 'UniCoin', 'Dirac', 'Solcoin', 'ParallelCoin', 'DarkCoin', 'Digital Credits', 'VegasCoin', 'Elektron', 'TenneT', 'Phalanx', 'Selfiecoin', 'Neocoin', 'Bubble', 'Noirbits', 'Quedos', 'Vibranium', 'Cryptokenz', 'FistBump', 'Evotion', 'ORObit', 'Save and Gain', 'GuccioneCoin', 'Digit', 'ProsperCoin', 'CryptoSpots', 'Lightcoin', 'DigitalPrice', 'SpaceCoin', 'ChainCoin', 'Hundredcoin', 'Crypto', 'P7Coin', 'Eurocoin', 'HazMatCoin', 'LegendaryCoin', 'HeelCoin', 'CryptBit', 'Fantom', 'Donationcoin', 'Denarius', 'TacoCoin', 'Unrealcoin', 'CleverCoin', 'OsmiumCoin', 'GoodCoin', 'Ozziecoin', 'Californium', '23 Skidoo', 'Forevercoin', 'DuckDuckCoin', 'GBCGoldCoin', 'BanxShares', 'Augur', 'CryptoByte', 'NuBits', 'NxttyACCI', 'Asset Backed Coin', 'ClubCoin', 'LEOcoin', 'Agoras Tokens', 'KolschCoin', 'Sharkcoin', 'UNCoin', 'BnB Coin', 'FutCoin', 'Kcoin', 'ShellPay', 'Faucetcoin', 'Stakerush', 'vTorrent', 'AIB', 'BitSeeds', 'DigiEuro', 'SpikesPrivateCoin', 'Nocturna', 'InvisibleCoin', 'SmartChips', 'Shift', 'Bytecoin', 'ROXcoin', 'Coinworkscoin', 'Pebblecoin', 'SkullBuzz', 'CraftCoin', 'PLAY', 'Local Family Owned', 'AmeroX', 'CHNCoin', 'IrishCoin', 'Motocoin', 'Aegis', 'Bolivarcoin', 'Nibble', 'BitCent', 'DarkToken', 'Cthulhu Offerings', 'BitcoinFast', 'SSVCoin', 'TickCoin', 'Diggits', 'PlanetCoin', 'Flaxscript', 'FriendshipCoin 2', 'AlphaCoin', 'AvatarCoin', 'Dubstep', 'Grexit', 'EZCoin', 'DarkCypher', 'RubleBit', 'AmericanCoin', 'AdderalCoin', 'NXE', 'Dotcoin', 'NanoToken', 'Skeincoin', 'TrickyCoin', 'Graviton', 'ElephantCoin', 'LiteStarCoin', 'X2', 'BigCoin', 'StarCoin', 'Memecoin', 'QuitDough', 'UPcoin', 'WorldPay', 'Coin(O)', 'iBits', 'Cashme', 'Trinity', 'Moneta', 'GameCoin', 'PurePOS', 'DarkEther', 'XenCoin', 'Biebercoin', 'The Cypherfunks', 'Paccoin', 'Pennies']