我在Python中使用beautifulsoup来获取所有链接:
links = soup.select('.cover > .card-click-target')
print(links);
但它给了我一个包含一个元素和字符串值的数组。
我的HTML代码是:
<div class="cover">
<div class="cover-image-container">
<div class="cover-outer-align">
<div class="cover-inner-align">
<img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true">
</div>
</div>
</div>
<a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite ">
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
<span class="preordered-label">Предзаказ</span>
</span>
<span class="preview-overlay-container"> </span>
</a>
</div>
<div class="cover">
<div class="cover-image-container">
<div class="cover-outer-align">
<div class="cover-inner-align">
<img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true">
</div>
</div>
</div>
<a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite ">
<span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
<span class="preordered-label">Предзаказ</span>
</span>
<span class="preview-overlay-container">
</span>
</a>
</div>
答案 0 :(得分:1)
我不完全信任BeautifulSoup中的CSS选择器,只是一个快速搜索,你会发现this answer here谈到更新BeautifulSoup解决了他遇到的问题。
我强烈建议您write a function完成这项工作
links = soup.find_all(lambda tag: tag.parent.get('class', None) == ['cover'] \
and tag.get('class', None) == ['card-click-target'])
匿名lambda函数将搜索类card-click-target
的所有标记,并确保这些标记具有类cover
的父级。
答案 1 :(得分:0)
检查此示例:
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="TRACE">
<Properties>
<Property name="rotateLogsInterval">6</Property>
<Property name="log.dir">D:\\Mconnect\\LOGGER</Property>
<Property name="log.INVALIDMNO.dir">D:\\Mconnect\\LOGGER\\INVALIDMNO</Property>
<Property name="log.MOBINL1000.dir">D:\\Mconnect\\LOGGER\\MOBINL1000</Property>
<Property name="log.ECONET1000.dir">D:\\Mconnect\\LOGGER\\ECONET1000</Property>
<Property name="log.AIRTEL1000.dir">D:\\Mconnect\\LOGGER\\AIRTEL1000</Property>
</Properties>
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%-5p %d [%t] %c: %m%n" />
</Console>
<File name="EIGInformation"
fileName="C:\\EIG_SOURCE_CODE\\EIG_20140901\\logs\\EIGInformation1.log">
<PatternLayout>
<Pattern>%5p | %m%n</Pattern>
</PatternLayout>
</File>
<!-- Debug logger -->
<RollingRandomAccessFile name="debugLogger"
fileName="${log.dir}/mconnectDebugLogger.log"
filePattern="${log.dir}/$${date:yyyy-MM}/mconnectDebugLogger-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.dir}" maxDepth="2">
<IfFileName glob="*/mconnectDebugLogger-*.log.gz" /> <IfLastModified age="60d"
/> </Delete> </DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
<!-- Transaction tdr file -->
<RollingRandomAccessFile name="transactionDetails"
fileName="${log.dir}/TDR.log"
filePattern="${log.dir}/$${date:yyyy-MM}/TDR-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %t:: | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.dir}" maxDepth="2">
<IfFileName glob="*/TDR-*.log.gz" /> <IfLastModified age="60d" /> </Delete>
</DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
<!-- Connect Info General log. -->
<RollingRandomAccessFile name="connectInfoLogGeneral"
fileName="${log.INVALIDMNO.dir}/connectInfoLogGeneral.log"
filePattern="${log.INVALIDMNO.dir}/$${date:yyyy-MM}/connectInfoLogGeneral-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.INVALIDMNO.dir}"
maxDepth="2"> <IfFileName glob="*/connectInfoLogGeneral-*.log.gz" /> <IfLastModified
age="60d" /> </Delete> </DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
<!-- Connect Process log. -->
<RollingRandomAccessFile name="connectProcessLogGeneral"
fileName="${log.INVALIDMNO.dir}/connectProcessLogGeneral.log"
filePattern="${log.INVALIDMNO.dir}/$${date:yyyy-MM}/connectProcessLogGeneral-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.log.INVALIDMNO.dir.dir}"
maxDepth="2"> <IfFileName glob="*/connectProcessLogGeneral-*.log.gz" /> <IfLastModified
age="60d" /> </Delete> </DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
<!-- Connect Info log -->
<RollingRandomAccessFile name="connectInfoLogMOBINL1000"
fileName="${log.MOBINL1000.dir}/connectInfoMOBINL1000.log"
filePattern="${log.MOBINL1000.dir}/$${date:yyyy-MM}/connectInfoMOBINL1000-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.MOBINL1000.dir}"
maxDepth="2"> <IfFileName glob="*/connectInfoMOBINL1000-*.log.gz" /> <IfLastModified
age="60d" /> </Delete> </DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
<!-- Connect Process log -->
<RollingRandomAccessFile name="connectProcessLogMOBINL1000"
fileName="${log.MOBINL1000.dir}/connectProcessMOBINL1000.log"
filePattern="${log.MOBINL1000.dir}/$${date:yyyy-MM}/connectProcessMOBINL1000-%d{yyyy-MM-dd-HH}-%i.log.gz">
<PatternLayout>
<Pattern>%5p | %d | %m%n</Pattern>
</PatternLayout>
<!-- <DefaultRolloverStrategy> <Delete basePath="${log.MOBINL1000.dir}"
maxDepth="2"> <IfFileName glob="*/connectProcessMOBINL1000-*.log.gz" /> <IfLastModified
age="60d" /> </Delete> </DefaultRolloverStrategy> -->
<Policies>
<TimeBasedTriggeringPolicy interval="${rotateLogsInterval}" />
</Policies>
</RollingRandomAccessFile>
</Appenders>
<Loggers>
<!-- CXF is used heavily by Mule for web services -->
<AsyncLogger name="org.apache.cxf" level="WARN" />
<!-- Apache Commons tend to make a lot of noise which can clutter the log -->
<AsyncLogger name="org.apache" level="INFO" />
<!-- Reduce startup noise -->
<AsyncLogger name="org.springframework.beans.factory"
level="WARN" />
<!-- Mule classes -->
<AsyncLogger name="org.mule" level="INFO" />
<AsyncLogger name="com.mulesoft" level="INFO" />
<AsyncLogger name="EIGInformation" level="INFO">
<AppenderRef ref="EIGInformation" />
</AsyncLogger>
<AsyncLogger
name="com.comviva.mconnect.webservices.impl.MConnectWebServices"
level="info">
<AppenderRef ref="debugLogger" />
</AsyncLogger>
<AsyncLogger name="transactionDetails" level="OFF">
<AppenderRef ref="debugLogger" />
</AsyncLogger>
<AsyncLogger name="connectInfoLogGeneral" level="INFO">
<AppenderRef ref="connectInfoLogGeneral" />
</AsyncLogger>
<AsyncLogger name="connectProcessLogGeneral" level="INFO">
<AppenderRef ref="connectProcessLogGeneral" />
</AsyncLogger>
<AsyncLogger name="connectInfoLogMOBINL1000" level="INFO">
<AppenderRef ref="connectInfoLogMOBINL1000" />
</AsyncLogger>
<AsyncLogger name="connectProcessLogMOBINL1000" level="INFO">
<AppenderRef ref="connectProcessLogMOBINL1000" />
</AsyncLogger>
<AsyncRoot level="INFO">
<AppenderRef ref="EIGInformation" />
</AsyncRoot>
</Loggers>
</Configuration>
log4j: Using URL [file:/home/contest/prd/muleTomcat/webapps/Connect-1.3.0/WEB-INF/classes/log4j2.xml] for automatic log4j configuration.
log4j: Preferred configurator class: org.apache.log4j.xml.DOMConfigurator
log4j: System property is :null
log4j: Standard DocumentBuilderFactory search succeded.
log4j: DocumentBuilderFactory is: org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
log4j:WARN Continuable parsing error 2 and column 31
log4j:WARN Document root element "Configuration", must match DOCTYPE root "null".
log4j:WARN Document root element "Configuration", must match DOCTYPE root "null".
log4j:WARN Continuable parsing error 2 and column 31
log4j:WARN Document is invalid: no grammar found.log4j:WARN Document is invalid: no grammar found.
log4j:ERROR DOM element is - not a <log4j:configuration> element.
log4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[localhost-startStop-1] INFO org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean - Building JPA container EntityManagerFactory for persistence