从Python中的大量字符串中读取引号内的子字符串

时间:2018-12-07 09:30:18

标签: python string request substring urlrequest

我有以下字符串:

{"name":"INPROCEEDINGS","__typename":"PublicationConferencePaper"},"hasPermiss
ionToLike":true,"hasPermissionToFollow":true,"publicationCategory":"researchSu
mmary","hasPublicFulltexts":false,"canClaim":false,"publicationType":"inProcee
dings","fulltextRequesterCount":0,"requests":{"__pagination__":
[{"offset":0,"limit":1,"list":[]}]},"activeFiguresCount":0,"activeFigures":
{"__pagination__":[{"offset":0,"limit":100,"list":
[]}]},"abstract":"Heterogeneous Multiprocessor System-on-Chip (MPSoC) are 
progressively becoming predominant in most modern mobile devices. These 
devices are required to perform processing of applications within thermal,
 energy and performance constraints. However, most stock power and thermal
 management mechanisms either neglect some of these constraints or rely on 
frequency scaling to achieve energy-efficiency and temperature reduction on 
the device. Although this inefficient technique can reduce temporal thermal
 gradient, but at the same time hurts the performance of the executing task.
 In this paper, we propose a thermal and energy management mechanism which 
achieves reduction in thermal gradient as well as energy-efficiency through 
resource mapping and thread-partitioning of applications with online 
optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is 
experimentally appraised using different applications from Polybench benchmark 
suite on Odroid-XU4 developmental platform. Results show 28% performance 
improvement, 28.32% energy saving and reduced thermal variance of over 76%
 when compared to the existing approaches. Additionally, the method is able to
 free more than 90% in memory storage on the MPSoC, which would have been 
previously utilized to store several task-to-thread mapping 
configurations.","hasRequestedAbstract":false,"lockedFields"

我正在尝试获取“抽象”:“ ”,“ hasRequestedAbstract” 之间的子字符串。为此,我使用以下代码:

    import requests
    #some more codes here........
    to_visit_url = 'https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs'
    this_page = requests.get(to_visit_url)
    content = str(page.content, encoding="utf-8")
    abstract = re.search('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"', content)
    print('Abstract:\n' + str(abstract))

但是在抽象变量中,它的值为None。可能是什么问题?我如何如上所述获取子字符串?

注意:尽管似乎我可以将其读取为JSON对象,但这不是一个选择,因为上面提供的示例文本只是完整html内容的一小部分,很难从其中提取JSON对象。

P.S。页面的完整内容,即page.content,可以从此处下载:https://docs.google.com/document/d/1awprvKsLPNoV6NZRmCkktYwMwWJo5aujGyNwGhDf7cA/edit?usp=sharing

或者也可以直接从以下网址下载源:https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs

3 个答案:

答案 0 :(得分:1)

re.search不返回已解析的结果列表。它返回SRE_Match对象。 如果要获取匹配列表,则需要使用re.findall方法。

  1. 经过测试的代码

    import re
    import requests
    
    test_pattern = re.compile('\"abstract\":\"(.*)\",\"hasRequestedAbstract\"')
    test_requests = requests.get("https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs")
    
    print(test_pattern.findall(test_requests.text)[0])
    
  2. 结果

    'Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.'
    

答案 1 :(得分:1)

此答案未使用正则表达式(正则表达式),但已完成工作。回答如下:

import re
import requests

def fetch_abstract(url = "https://www.researchgate.net/publication/328749434_TEEM_Online_Thermal-_and_Energy-Efficiency_Management_on_CPU-GPU_MPSoCs"):
    test_requests = requests.get(url)
    index = 0
    inner_count = 0
    while index < len(test_requests.text):
            index = test_requests.text.find('[Show full abstract]</a><span class=\"lite-page-hidden', index)
            if index == -1:
                break
            inner_count += 1
            if inner_count == 4:
                #extract the abstract from here -->
                temp = test_requests.text[index-1:]
                index2 = temp.find('</span></div><a class=\"nova-e-link nova-e-link--color-blue')
                quote_index = temp.find('\">')
                abstract = test_requests.text[index + quote_index + 2 : index - 1 + index2]
                print(abstract)
            index += 52

if __name__ == '__main__':
    fetch_abstract()

结果:

  

异构多处理器片上系统(MPSoC)逐渐发展   在大多数现代移动设备中变得越来越流行。这些设备是   在热量,能量范围内执行应用程序处理所需的   和性能限制。但是,大多数库存功率和热量   管理机制要么忽略了一些约束,要么依赖   进行频率缩放以实现能效和温度   减少设备。尽管这种低效率的技术可以   降低瞬时温度梯度,但同时会损害   执行任务的性能。在本文中,我们提出了一种热   和能源管理机制,可减少热量   通过资源映射获得梯度和能源效率   应用程序中的在线优化对线程进行分区   异构MPSoC。所提出的方法的功效是   使用Polybench的不同应用进行了实验评估   Odroid-XU4开发平台上的基准套件。结果显示28%   性能提升,节能28.32%,热量减少   与现有方法相比,差异超过76%。   此外,该方法能够释放90%以上的内存   存储在MPSoC上,该存储以前曾用于   存储一些任务到线程的映射配置。

答案 2 :(得分:0)

进行requests.get(...)时,您应该获得一个请求对象吗?

这些对象确实很聪明,您可以使用内置的.json()方法将您在问题中发布的字符串作为python字典返回。

尽管我注意到您发布的链接并不指向类似的内容,而是指向完整的html文档。如果您尝试解析这样的网站,则应改用beautifulsoup。 (https://www.crummy.com/software/BeautifulSoup/