Question

我有一个字符串，

"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.

我将使用正则表达式选择文本：

Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket

是＆＃34; text＆＃34;：＆＃34;之前＆＃34;，＆＃34; timestamp_ms＆＃34;：

是否可以收集这些文字？

Answer 1

可能？是。

def text_scrap(text, start, end):
    """This function returns the data between start and end."""
    _,_,rest = text.partition(start)
    result,_,_ = rest.partition(end)
    return result

my_text = "contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.

data_scrapped = text_scrap(my_text, start=' "text": "', end="timestamp_ms") # use our new shiny function
print(data_scrapped)

好主意？可能不是。

您的代码是一个字典，因此您可以更轻松地访问＆＃34;文本＆＃34;这个词的关键。请检查This以了解dicts。

Answer 2

虽然从字符串看来你的整个字符串可能会被解析，因为它似乎是JSON。但是，由于您正在寻找与正则表达式相关的解决方案，因此我希望以下内容适合您。

import re

pattern = '"text": "(.*), "timestamp_ms"'

str = """
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
"""

print re.findall(pattern, string=str)[0]

<强>输出：

Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket"

在Python中选择正则表达式

2 个答案: