我正在从以下URL解析数据表: https://www.signalstart.com/search-signals
特别是,我试图从表行中提取数据。
表格行包含一系列表格数据单元格:
<table class="table table-striped table-bordered dataTable table-hover" id="searchSignalsTable">
<thead>
<tr>
<th class="sorting sorting_asc"><a href="javascript:void(0)" onclick="sortTable(this,1,48,'searchSignalsData', '&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Rank</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,3,'searchSignalsData', '&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Name</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,19,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Gain</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,37,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Pips</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,23,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">DD</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,100,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Trades</a></th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,26,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Type</a></th>
<th>Monthly</th>
<th>Chart</th>
<th class="sorting "><a href="javascript:void(0)" onclick="sortTable(this,1,102,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Price</a></th>
<th class="sorting " style="width: 40px"><a href="javascript:void(0)" onclick="sortTable(this,1,103,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Age</a></th>
<th class="sorting " style="width: 70px"><a href="javascript:void(0)" onclick="sortTable(this,1,1,'searchSignalsData','&ps=20&ts=706&yieldType=&yieldVal=&drawType=&drawVal=&pipsType=&pipsVal=&type=&ageType=&tradesType=&tradesVal=&priceType=&priceVal=&fifoVal=&searchVal=&serversMultiSearch=', false, true, true, 'Search Signals');" order="0">Added</a></th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/joker-1k/110059">Joker 1k</a> </td>
<td><span class="red">-9.99%</span></td>
<td><span class="green">2,092.3</span></td>
<td>15.3%</td>
<td>108</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark110059"><canvas width="12" height="25" style="display: inline-block; vertical-align: top; width: 12px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark110059"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
1m 24d
</td>
<td>
Mar 29, 2020
</td>
<td><a onclick="getMasterPricingData('110059');" data-toggle="modal"><button id="subscribeToMasterBtn110059" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="110059" value="-1.78,-3.68,-4.86">
<input type="hidden" class="dailyGrowthData" oid="110059" value="0.00,-0.03,-1.78,-5.69,-6.75,-5.59,-7.61,-5.31,-6.20,-3.81,-4.40,-8.00,-2.88,-3.78,-4.38,-0.20,-5.40,-10.66,-13.69,-12.51,-13.23,-9.99">
<input type="hidden" class="dailyEquityData" oid="110059" value="0.00,-0.23,-1.41,-5.02,-6.25,-4.29,-6.68,-3.91,-5.37,-4.10,-4.40,-3.59,-1.78,-1.75,-2.65,-0.21,-4.87,-10.76,-13.90,-11.58,-13.23,-10.18">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/fxabakus/56043">FXabakus</a> </td>
<td><span class="red">-19.57%</span></td>
<td><span class="red">-8,615.2</span></td>
<td>42%</td>
<td>1642</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark56043"><canvas width="80" height="25" style="display: inline-block; vertical-align: top; width: 80px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark56043"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
1y 7m
</td>
<td>
May 4, 2019
</td>
<td><a onclick="getMasterPricingData('56043');" data-toggle="modal"><button id="subscribeToMasterBtn56043" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="56043" value="1.22,1.35,3.92,1.35,-1.57,1.77,2.01,1.11,0.38,-14.89,-14.70,-5.21,5.97,7.03,-17.54,2.92,3.11,-8.94,13.38,1.77">
<input type="hidden" class="dailyGrowthData" oid="56043" value="-27.87,-29.29,-29.01,-26.76,-25.76,-25.59,-30.57,-30.13,-29.78,-29.60,-29.25,-28.34,-28.07,-27.89,-25.20,-25.08,-23.66,-23.46,-21.54,-21.02,-21.62,-20.28,-18.31,-26.97,-27.48,-27.00,-28.21,-24.20,-23.46,-30.04,-31.37,-34.62,-33.84,-32.87,-32.20,-30.99,-30.43,-30.30,-29.75,-27.64,-27.45,-24.34,-24.71,-24.09,-24.15,-21.48,-21.08,-20.97,-19.54,-19.57">
<input type="hidden" class="dailyEquityData" oid="56043" value="-27.87,-29.29,-28.89,-26.76,-25.76,-28.10,-34.47,-32.34,-31.54,-40.80,-32.76,-32.90,-33.50,-30.65,-25.37,-25.05,-22.88,-23.29,-21.54,-21.02,-21.54,-20.90,-19.11,-27.76,-35.15,-29.17,-27.79,-24.20,-26.23,-34.32,-35.95,-51.20,-33.84,-32.76,-32.71,-31.62,-30.43,-39.93,-29.75,-27.64,-28.35,-27.62,-28.41,-24.20,-24.51,-22.06,-21.08,-20.97,-18.82,-30.27">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/af-investing-pro-final/122603">AF Investing Pro Final</a> </td>
<td><span class="green">56.69%</span></td>
<td><span class="green">29,812</span></td>
<td>8.6%</td>
<td>476</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark122603"><canvas width="8" height="25" style="display: inline-block; vertical-align: top; width: 8px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark122603"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$250</td>
<td>
17d 12h
</td>
<td>
Apr 30, 2020
</td>
<td><a onclick="getMasterPricingData('122603');" data-toggle="modal"><button id="subscribeToMasterBtn122603" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="122603" value="55.18,0.98">
<input type="hidden" class="dailyGrowthData" oid="122603" value="-0.02,0.04,54.78,55.02,55.18,55.82,55.86,55.99,56.06,56.25,56.69">
<input type="hidden" class="dailyEquityData" oid="122603" value="-8.60,16.85,54.86,54.11,55.44,55.85,54.38,52.15,45.00,51.07,56.25">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/rapid-growth/111340">Rapid growth</a> </td>
<td><span class="green">130.78%</span></td>
<td><span class="green">1,102.9</span></td>
<td>44.3%</td>
<td>126</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark111340"><canvas width="12" height="25" style="display: inline-block; vertical-align: top; width: 12px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark111340"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$31</td>
<td>
2m 8d
</td>
<td>
Apr 1, 2020
</td>
<td><a onclick="getMasterPricingData('111340');" data-toggle="modal"><button id="subscribeToMasterBtn111340" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="111340" value="87.85,18.28,3.87">
<input type="hidden" class="dailyGrowthData" oid="111340" value="0.00,0.64,1.40,1.40,1.90,2.91,7.53,8.21,11.19,11.30,17.60,19.60,23.03,37.74,47.75,54.75,59.91,69.79,73.60,79.36,87.85,93.14,93.40,94.70,95.93,96.01,99.95,100.71,101.85,102.10,102.12,104.36,108.76,110.11,110.14,110.23,112.58,115.10,115.54,117.17,121.24,122.19,123.40,124.18,124.88,124.89,130.09,130.78">
<input type="hidden" class="dailyEquityData" oid="111340" value="-1.80,0.67,0.97,1.91,-0.64,2.58,6.82,6.72,8.65,8.46,16.29,17.71,19.96,34.10,47.24,51.91,59.07,69.79,73.58,79.26,88.01,91.03,93.43,87.85,96.19,95.80,100.29,95.63,98.94,101.71,98.33,104.12,108.26,108.46,86.24,108.42,112.83,114.51,94.42,116.29,120.16,121.93,123.05,115.67,122.81,124.45,130.47,130.14">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/dream-presentation-1/66543">Dream Presentation 1</a> </td>
<td><span class="red">-99.9%</span></td>
<td><span class="red">-2,724.1</span></td>
<td>99.9%</td>
<td>1612</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark66543"><canvas width="28" height="25" style="display: inline-block; vertical-align: top; width: 28px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark66543"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
6m 13d
</td>
<td>
Nov 8, 2019
</td>
<td><a onclick="getMasterPricingData('66543');" data-toggle="modal"><button id="subscribeToMasterBtn66543" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="66543" value="-100.14,-98.54,-98.79,-91.71,-98.23,-100.00,-88.82">
<input type="hidden" class="dailyGrowthData" oid="66543" value="24.18,-99.90,-99.89,-99.88,-99.88,-99.88,-99.87,-99.87,-99.86,-99.84,-99.83,-99.90,-99.89,-99.90,-99.90,-99.81,-99.81,-99.80,-99.90,-99.90,-99.86,-99.83,-99.79,-99.90,-99.90,-99.90,-99.88,-99.89,-99.89,-99.88,-99.82,-99.74,-99.85,-99.37,-99.88,-99.90,-99.90,-99.90,-99.90,-99.87,-99.83,-99.80,-99.75,-99.64,-99.56,-99.90,-99.90">
<input type="hidden" class="dailyEquityData" oid="66543" value="7.87,-99.90,-99.89,-99.88,-99.88,-99.88,-99.88,-99.87,-99.86,-99.84,-99.83,-99.90,-99.89,-99.90,-99.89,-99.83,-99.88,-99.88,-99.90,-99.90,-99.87,-99.83,-99.84,-99.72,-99.90,-99.90,-99.88,-99.89,-99.88,-99.92,-99.86,-99.74,-99.86,-99.39,-99.88,-99.90,-99.90,-99.90,-99.90,-99.87,-99.83,-99.79,-99.76,-99.63,-99.55,-100.16,-99.83">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/limerence-ea-suite-3/93679">Limerence EA Suite 3</a> </td>
<td><span class="green">1,246.66%</span></td>
<td><span class="green">199.8</span></td>
<td>34.2%</td>
<td>8</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark93679"><canvas width="20" height="25" style="display: inline-block; vertical-align: top; width: 20px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark93679"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$75</td>
<td>
7m 11d
</td>
<td>
Feb 11, 2020
</td>
<td><a onclick="getMasterPricingData('93679');" data-toggle="modal"><button id="subscribeToMasterBtn93679" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="93679" value="95.40,82.01,94.38,87.49,3.90">
<input type="hidden" class="dailyGrowthData" oid="93679" value="0.00,95.40,255.64,591.28,552.49,1234.12,1196.10,1246.66">
<input type="hidden" class="dailyEquityData" oid="93679" value="0.00,95.40,255.64,591.28,1034.76,1234.12,1196.10,1246.66">
</div>
</td>
</tr>
<tr>
<td style="text-align: center;"> - </td>
<td><a class="pointer" target="_blank" href="https://www.signalstart.com/analysis/easy-money/31727">Easy Money</a> </td>
<td><span class="red">-99.9%</span></td>
<td><span class="green">2,430.6</span></td>
<td>100%</td>
<td>1095</td>
<td>Real</td>
<td><span class="monthlySparkline" id="monthlySpark31727"><canvas width="96" height="25" style="display: inline-block; vertical-align: top; width: 96px; height: 25px;"></canvas></span></td>
<td><span class="dayliSparkline" id="dayliSpark31727"><canvas width="100" height="25" style="display: inline-block; vertical-align: top; width: 100px; height: 25px;"></canvas></span></td>
<td>$30</td>
<td>
2y 2m
</td>
<td>
Apr 1, 2018
</td>
<td><a onclick="getMasterPricingData('31727');" data-toggle="modal"><button id="subscribeToMasterBtn31727" class="btn btn-circle btn-sm green" type="button">Copy</button></a>
<div style="display: none;">
<input type="hidden" class="monthlyData" oid="31727" value="6.22,-6.15,22.04,-5.08,0.08,12.08,-69.31,-99.82,245.26,88.44,113.73,52.29,25.38,77.72,-29.07,-24.73,-86.48,-89.27,195.77,-7.65,-99.98,278.89,-69.98,-65.48">
<input type="hidden" class="dailyGrowthData" oid="31727" value="-99.66,-99.69,-99.72,-99.73,-99.77,-99.77,-99.78,-99.81,-99.90,-99.90,-99.89,-99.84,-99.83,-99.82,-99.81,-99.75,-99.78,-99.77,-99.79,-99.78,-99.77,-99.48,-99.46,-99.36,-99.34,-99.33,-99.33,-99.31,-99.33,-99.34,-99.40,-99.45,-99.33,-99.58,-99.65,-99.73,-99.71,-99.70,-99.68,-99.68,-99.69,-99.68,-99.71,-99.68,-99.80,-99.80,-99.77,-99.81,-99.84,-99.90">
<input type="hidden" class="dailyEquityData" oid="31727" value="-99.66,-99.69,-99.73,-99.70,-99.85,-99.89,-99.95,-99.77,-99.85,-99.90,-99.88,-99.84,-99.83,-99.82,-99.79,-99.75,-99.78,-99.77,-99.70,-99.68,-99.59,-99.48,-99.46,-99.36,-99.34,-99.33,-99.32,-99.25,-99.30,-99.34,-99.37,-99.37,-99.35,-99.58,-99.61,-99.73,-99.71,-99.69,-99.68,-99.68,-99.68,-99.68,-99.71,-99.68,-99.80,-99.76,-99.73,-99.79,-99.80,-99.89">
</div>
</td>
</tr>
</tbody>
</table>
我的代码成功地从第一个表数据单元格(等级)中提取了数据。但是对于第二个表数据单元格(名称),它显示为空白。此源代码有什么问题:
import scrapy
from behold import Behold
class SignalStartSpider(scrapy.Spider):
name = 'signalstart'
start_urls = [
'https://www.signalstart.com/search-signals',
]
def parse(self, response):
for provider in response.xpath("//div[@class='row']//tr"):
yield {
'rank': provider.xpath('td[1]/text()').get(),
'name': provider.xpath('td[2]/text()').get(),
}
我现在要遍历td
中的tr
单元并得到td
单元,但是我的最后一个问题是:如何从td单元中获取文本我有吗?
import scrapy
from behold import Behold
class SignalStartSpider(scrapy.Spider):
name = 'signalstart'
start_urls = [
'https://www.signalstart.com/search-signals',
]
def parse(self, response):
cols = "rank name gain pips drawdown trades type monthly chart price age added action"
skip = [9,13]
td = dict()
for i, col in enumerate(cols.split()):
td[i] = col
Behold().show('td')
for provider in response.xpath("//div[@class='row']//tr"):
data_row = dict()
for i, datum in enumerate(provider.xpath('td')):
if i in skip:
continue
data_row[td[i]] = datum
# Behold().show('datum')
yield data_row
答案 0 :(得分:1)
正确的答案由$invitees[0]->invitee_name
在Scrapy IRC频道中提供-这是代码:
gallaecio_
对于更多涉及的案件,您可能需要https://github.com/TeamHG-Memex/html-text