我有一个网页,其中包含<span class="x"></span>
个标签中包含的各种文字摘要。我想生成每个这样的代码片段的有序列表。直截了当。
皱纹:经常会发生嵌套在外部的<span class="x">
标签,我不在乎。基本上,我想要一个至少在一个<span class="x">
标签内的每个字符串的列表,但是应该忽略并丢弃任何其他嵌套的这样的标签。
以下是一些HTML示例:
<p>
Outer text. <span class="x">Inside a single span.</span> Back to outer text once more. <span class="x"><span class="x">Inside two spans</span> or just one</span>. Perhaps a <span class="x">single span contains <span class="x">several</span>
<span class="x">nests</span> <span class="x">within <span class="x">it</span>
</span>!</span>
</p>
<span>Maybe there's a span out here.</span><span>(Or two.)</span>
<p>
<table>
<tr>
<td>
<span class="x">Or <span class="x">in</span><span class="x">here</span></span>.
</td>
</tr>
</table>
</p>
<p>
<span>No.</span> <span>Still no, but<span class="x">yes</span>.</span>
</p>
以及我想要的输出:
[ "Inside a single span.",
"Inside two spans or just one",
"single span contains several nests within it!",
"Maybe there's a span out here.",
"(Or two.)",
"Or inhere",
"yes" ]
此示例的具体功能我想引起注意:
<span class="x">
标记包含除其他嵌套<span class="x">
标记之外的任何HTML标记。我会对JavaScript + jQuery解决方案或Python3 + BeautifulSoup解决方案感到满意,或者如果它比其中任何一个更适合手头的任务,我会很满意。
答案 0 :(得分:1)
尝试:
$('span.x').each(function(index, el) {
console.log(el.childNodes[0].textContent)
});
或
$('span.x').each(function(index, el) {
$(el).text();
});
这是当然的jquery例子。 它将在控制台中列出所有跨度文本值。
只需使用此代码段构建您的有序列表。
答案 1 :(得分:1)
您可以通过简单的jQuery语句获得JavaScript的完整文本列表:
$("span.x").map(function(e) {return $(this).text() == "" ? null : $(this).text()})
由您决定如何使用它。
答案 2 :(得分:1)
JS解决方案:
<?php
require_once('.config.inc.php');
$serviceUrl = "https://mws.amazonservices.com/Products/2011-10-01";
$config = array (
'ServiceURL' => $serviceUrl,
'ProxyHost' => null,
'ProxyPort' => -1,
'ProxyUsername' => null,
'ProxyPassword' => null,
'MaxErrorRetry' => 3,
);
$service = new MarketplaceWebServiceProducts_Client(
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
APPLICATION_NAME,
APPLICATION_VERSION,
$config);
//First we set up all the list variables
$FeesEstimateRequest = new MarketplaceWebServiceProducts_Model_FeesEstimateRequest();
$FeesEstimateRequest->setMarketplaceId('ATVPDKIKX0DER'); // Amazon.com marketplace id
$FeesEstimateRequest->setIdType('SellerSKU'); // IdType values: ASIN, SellerSKU, SellerSKU in your case
$FeesEstimateRequest->setIdValue('XXXXXXXXXX'); // The value of the id you have entered
$FeesEstimateRequest->setIdentifier('request1'); // A identifier for the item you have requested, this is for your own use
$FeesEstimateRequest->setIsAmazonFulfilled(FALSE); // Fullfilled by Amazon? true if the offer is fulfilled by Amazon.
//To set up the $PriceToEstimateFees object we need two instances of the object MarketplaceWebServiceProducts_Model_MoneyType
//@ set up for both cases: Listing Price and Shipping Price
//New object MoneyType, setting up the currency and amount for listing price
$MoneyTypeListingPrice = new MarketplaceWebServiceProducts_Model_MoneyType();
$MoneyTypeListingPrice->setCurrencyCode('USD'); // String, the currency code of the price : USD in this example for amazon.com marketplace
$MoneyTypeListingPrice->setAmount('0.00'); // String, the price of the item
//New object MoneyType, setting up the currency and amount for shipping price
$MoneyTypeShipping = new MarketplaceWebServiceProducts_Model_MoneyType();
$MoneyTypeShipping->setCurrencyCode('USD'); // String, the currency code of the price : USD in this example for amazon.com marketplace
$MoneyTypeShipping->setAmount('0.00'); // String, the price of the item
//Setting up the prices: Listing Price and Shipping Price
$PriceToEstimateFees = new MarketplaceWebServiceProducts_Model_PriceToEstimateFees();
$PriceToEstimateFees->setListingPrice($MoneyTypeListingPrice);
$PriceToEstimateFees->setShipping($MoneyTypeShipping);
//Finally setting up the $PriceToEstimateFees object to the $FeesEstimateRequest object
$FeesEstimateRequest->setPriceToEstimateFees($PriceToEstimateFees); // The product price that the fee estimate is based on.
//setting up the final required parameter in the $FeesEstimateRequestList object
$FeesEstimateRequestList = new MarketplaceWebServiceProducts_Model_FeesEstimateRequestList();
$FeesEstimateRequestList->setFeesEstimateRequest($FeesEstimateRequest);
// Last step : sending the $FeesEstimateRequestList object into $request
$request = new MarketplaceWebServiceProducts_Model_GetMyFeesEstimateRequest();
$request->setSellerId(MERCHANT_ID);
$request->setFeesEstimateRequestList($FeesEstimateRequestList);
// object or array of parameters
invokeGetMyFeesEstimate($service, $request);
function invokeGetMyFeesEstimate(MarketplaceWebServiceProducts_Interface $service, $request)
{
try {
$response = $service->GetMyFeesEstimate($request);
echo ("Service Response\n");
echo ("=============================================================================\n");
$dom = new DOMDocument();
$dom->loadXML($response->toXML());
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
echo $dom->saveXML();
echo("ResponseHeaderMetadata: " . $response->getResponseHeaderMetadata() . "\n");
} catch (MarketplaceWebServiceProducts_Exception $ex) {
echo("Caught Exception: " . $ex->getMessage() . "\n");
echo("Response Status Code: " . $ex->getStatusCode() . "\n");
echo("Error Code: " . $ex->getErrorCode() . "\n");
echo("Error Type: " . $ex->getErrorType() . "\n");
echo("Request ID: " . $ex->getRequestId() . "\n");
echo("XML: " . $ex->getXML() . "\n");
echo("ResponseHeaderMetadata: " . $ex->getResponseHeaderMetadata() . "\n");
}
}
此函数循环遍历html树的所有元素。如果其中一个元素是x类,则连接所有内部结果,并添加直接textNodes
注意: 这使用ES6。如果你不知道那是什么,请写评论,所以我向你解释
答案 3 :(得分:1)
用空格替换内跨标签应该可以完成这项任务:
var st = [];
$("span.x").map(function(e) {
st.push($(this).html().replace('<span class="x">','').replace('</span>',''));
});
console.log(st);
这有点脏,但你明白了
答案 4 :(得分:1)
首先使用类x
获得最多的跨度,但检查它没有类x
的父级。然后得到innerText
这些。
var topMost = $('span.x').filter(function() {
return !$(this).parents('.x').length;
});
var texts = topMost.map(function() {
return this.innerText;
});
console.log(texts);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p>
Outer text. <span class="x">Inside a single span.</span> Back to outer text once more. <span class="x"><span class="x">Inside two spans</span> or just one</span>. Perhaps a <span class="x">single span contains <span class="x">several</span>
<span class="x">nests</span> <span class="x">within <span class="x">it</span>
</span>!</span>
</p>
<span>Maybe there's a span out here.</span><span>(Or two.)</span>
<p>
<table>
<tr>
<td>
<span class="x">Or <span class="x">in</span><span class="x">here</span></span>.
</td>
</tr>
</table>
</p>
<p>
<span>No.</span> <span>Still no, but<span class="x">yes</span>.</span>
</p>
答案 5 :(得分:1)
不如其他解决方案那么优雅......
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
spans = soup.find_all('span', {'class':'x'})
children = []
for span in spans:
chilren.extend(span.findChildren())
children = [child.text for child in children]
results = [span.text for span in spans if span.text not in children]
答案 6 :(得分:0)
受到众多回应的启发,我自己写了一个BeautifulSoup解决方案。它的工作原理是在html中重复找到下一个<span class="x">
,然后在找到下一个标签之前从其中删除所有标签。
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
current_span = soup.head
while True:
current_span = current_span.find_next("span", class_="x")
if current_span:
current_span.string = "".join(current_span.strings)
else: break
return [span.string for span in soup.find_all("span", class_="x")]