我有一个字符串,
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
我将使用正则表达式选择文本:
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket
是" text":"之前"," timestamp_ms":
是否可以收集这些文字?
答案 0 :(得分:0)
可能?是。
def text_scrap(text, start, end):
"""This function returns the data between start and end."""
_,_,rest = text.partition(start)
result,_,_ = rest.partition(end)
return result
my_text = "contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
data_scrapped = text_scrap(my_text, start=' "text": "', end="timestamp_ms") # use our new shiny function
print(data_scrapped)
好主意?可能不是。
您的代码是一个字典,因此您可以更轻松地访问"文本"这个词的关键。请检查This以了解dicts。
答案 1 :(得分:0)
虽然从字符串看来你的整个字符串可能会被解析,因为它似乎是JSON。但是,由于您正在寻找与正则表达式相关的解决方案,因此我希望以下内容适合您。
import re
pattern = '"text": "(.*), "timestamp_ms"'
str = """
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
"""
print re.findall(pattern, string=str)[0]
<强>输出:强>
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket"