Question

我有一个以下格式的字符串

[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name' : 'Drama']

等我想提取喜剧，戏剧等值

我尝试使用以下RE失败。

('([^'])*')

我希望在'name'之后得到字符串的一部分：对于同一列表中{}下的每个字符串。例如[{'id': 35, 'name': 'Comedy'}]

我的数据来自熊猫数据框：

Answer 1

在这里，使用此正则表达式。

import re
txt = """
[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name' : 'Drama']
"""
results = re.findall("'name'\s*:\s*'([^']+)'", txt)
print(results)

打印：

['Comedy', 'Comedy', 'Drama']

如果您想要唯一值，只需set(results)

仅从字符串中提取流派

1 个答案: