如何通过正则表达式分割python列表

时间:2018-07-15 23:12:56

标签: python regex list split

我正在从网上逐行读取文件,每行都是一个列表。列表具有按此模式可见的三列:library(ggplot2) # devtools::install_github("thomasp85/patchwork") library(patchwork) a <- 1:20 b <- sample(a, 20) c <- sample(b, 20) d <- sample(c, 20) mydata <- data.frame(a, b, c, d) myplot1 <- ggplot(mydata, aes(x=a, y=b)) + geom_point() + labs(tag = "A") myplot2 <- ggplot(mydata, aes(x=b, y=c)) + geom_point() + labs(tag = "B") myplot3 <- ggplot(mydata, aes(x=c, y=d)) + geom_point() + labs(tag = "C") myplot4 <- ggplot(mydata, aes(x=d, y=a)) + geom_point() + labs(tag = "D") myplot1 + myplot2 + myplot3 + myplot4

这是我的代码:

+++$+++

我已尝试使用python3.6中的此指令拆分列表,但无法正常工作。任何建议都将受到赞赏:

列表:

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'latin-1'))
    for i, row in enumerate(reader):
        if i < 5:
            t = row[0].split('(\s\+{3}\$\+{3}\s)+')
            print(t)

这是我的正则表达式:

['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']

每一行只有一个组成部分-> row[0].split('(\s\+{3}\$\+{3}\s)+')

当我打印结果时,不拆分行。

2 个答案:

答案 0 :(得分:1)

row[0].split(' +++$+++ ')

应该在没有正则表达式的情况下准确地提供您想要的东西。

答案 1 :(得分:0)

假设您不想使用split(),那么如果您想放松一下并返回一个元组,这可能会有所帮助。

输入

import re
input = '''['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']'''
output = re.findall('\[\'([\S\s]+?)[\s]+[\+]{3}\$[\+]{3}[\s]+([\S\s]+?)[\s][\+]{3}\$[\+]{3}[\s]+([\S\s]+?)\'\]', input)
print(output)

输出:

[('m0', '10 things i hate about you', 'http://www.dailyscript.com/scripts/10Things.html'), ('m1', '1492: conquest of paradise', 'http://www.hundland.org/scripts/1492-ConquestOfParadise.txt'), ('m2', '15 minutes', 'http://www.dailyscript.com/scripts/15minutes.html'), ('m3', '2001: a space odyssey', 'http://www.scifiscripts.com/scripts/2001.txt'), ('m4', '48 hrs.', 'http://www.awesomefilm.com/script/48hours.txt')]   

我也在尝试使用交替的正则表达式,但是为了我的生命,我最终无法使公式起作用。我稍后再发布,但希望以上内容对您有帮助