在Python中选择正则表达式

时间:2017-10-27 04:27:42

标签: python

我有一个字符串,

"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.

我将使用正则表达式选择文本:

Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket

是" text":"之前"," timestamp_ms":

是否可以收集这些文字?

2 个答案:

答案 0 :(得分:0)

可能?是。

def text_scrap(text, start, end):
    """This function returns the data between start and end."""
    _,_,rest = text.partition(start)
    result,_,_ = rest.partition(end)
    return result

my_text = "contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.

data_scrapped = text_scrap(my_text, start=' "text": "', end="timestamp_ms") # use our new shiny function
print(data_scrapped)

好主意?可能不是。

您的代码是一个字典,因此您可以更轻松地访问"文本"这个词的关键。请检查This以了解dicts。

答案 1 :(得分:0)

虽然从字符串看来你的整个字符串可能会被解析,因为它似乎是JSON。但是,由于您正在寻找与正则表达式相关的解决方案,因此我希望以下内容适合您。

import re

pattern = '"text": "(.*), "timestamp_ms"'

str = """
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
"""

print re.findall(pattern, string=str)[0]

<强>输出:

Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket"