我正在使用python和regex,我正在尝试转换字符串,如下所示:
(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),
进入如下列表:
[
[1694439, 805577453641105408, '\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"', 2887640, NULL, NULL, NULL],
[1649240, 805577446758158336, '\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"', 2911510, NULL, NULL, NULL]
]
这里的主要问题是,正如您所看到的,文本中还有一些括号,我不想分开。
我已经尝试了\([^)]+\)
之类的内容,但很明显,这会在它找到的第一个)
处分裂。
任何线索如何解决这个问题?
答案 0 :(得分:0)
这是您正在寻找的输出吗?
big = """(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),"""
small = big.split('),')
print(small)
我正在做的是分裂),
然后只是循环并在正常情况下分割逗号。我将展示一个可以优化的基本方法:
new_list = []
for x in small:
new_list.append(x.split(','))
print(new_list)
现在的缺点是,有一个空列表,但你可以稍后放弃它。
答案 1 :(得分:0)
这是一个简单的正则表达式解决方案,可以捕获不同组中每个逗号分隔的值:
\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)
用法:
input_string = r"""(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),"""
import re
result = re.findall(r"\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)", input_string)
答案 2 :(得分:0)
嵌套括号在这里不是问题,因为它们被引号括起来。您所要做的就是分别匹配引用的部分:
import re
pat = re.compile(r"[^()',]+|'[^'\\]*(?:\\.[^'\\]*)*'|(\()|(\))", re.DOTALL)
s = r'''(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :( \"',2911510,NULL,NULL,NULL),'''
result = []
for m in pat.finditer(s):
if m.group(1):
tmplst = []
elif m.group(2):
result.append(tmplst)
else:
tmplst.append(m.group(0))
print(result)
如果您的字符串也可以包含引号之间没有括号的括号,则可以使用regex module 的递归模式解决问题(使用它并且csv模块是个好主意)或建立状态机。