Question

我正在从网上逐行读取文件，每行都是一个列表。列表具有按此模式可见的三列：library(ggplot2) # devtools::install_github("thomasp85/patchwork") library(patchwork) a <- 1:20 b <- sample(a, 20) c <- sample(b, 20) d <- sample(c, 20) mydata <- data.frame(a, b, c, d) myplot1 <- ggplot(mydata, aes(x=a, y=b)) + geom_point() + labs(tag = "A") myplot2 <- ggplot(mydata, aes(x=b, y=c)) + geom_point() + labs(tag = "B") myplot3 <- ggplot(mydata, aes(x=c, y=d)) + geom_point() + labs(tag = "C") myplot4 <- ggplot(mydata, aes(x=d, y=a)) + geom_point() + labs(tag = "D") myplot1 + myplot2 + myplot3 + myplot4。

这是我的代码：

+++$+++

我已尝试使用python3.6中的此指令拆分列表，但无法正常工作。任何建议都将受到赞赏：

列表：

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(codecs.iterdecode(r.iter_lines(), 'latin-1'))
    for i, row in enumerate(reader):
        if i < 5:
            t = row[0].split('(\s\+{3}\$\+{3}\s)+')
            print(t)

这是我的正则表达式：

['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']

每一行只有一个组成部分-> row[0].split('(\s\+{3}\$\+{3}\s)+')

当我打印结果时，不拆分行。

Answer 1

做

row[0].split(' +++$+++ ')

应该在没有正则表达式的情况下准确地提供您想要的东西。

Answer 2

假设您不想使用split（），那么如果您想放松一下并返回一个元组，这可能会有所帮助。

输入

import re
input = '''['m0 +++$+++ 10 things i hate about you +++$+++ http://www.dailyscript.com/scripts/10Things.html']
['m1 +++$+++ 1492: conquest of paradise +++$+++ http://www.hundland.org/scripts/1492-ConquestOfParadise.txt']
['m2 +++$+++ 15 minutes +++$+++ http://www.dailyscript.com/scripts/15minutes.html']
['m3 +++$+++ 2001: a space odyssey +++$+++ http://www.scifiscripts.com/scripts/2001.txt']
['m4 +++$+++ 48 hrs. +++$+++ http://www.awesomefilm.com/script/48hours.txt']'''
output = re.findall('\[\'([\S\s]+?)[\s]+[\+]{3}\$[\+]{3}[\s]+([\S\s]+?)[\s][\+]{3}\$[\+]{3}[\s]+([\S\s]+?)\'\]', input)
print(output)

输出：

[('m0', '10 things i hate about you', 'http://www.dailyscript.com/scripts/10Things.html'), ('m1', '1492: conquest of paradise', 'http://www.hundland.org/scripts/1492-ConquestOfParadise.txt'), ('m2', '15 minutes', 'http://www.dailyscript.com/scripts/15minutes.html'), ('m3', '2001: a space odyssey', 'http://www.scifiscripts.com/scripts/2001.txt'), ('m4', '48 hrs.', 'http://www.awesomefilm.com/script/48hours.txt')]

。

我也在尝试使用交替的正则表达式，但是为了我的生命，我最终无法使公式起作用。我稍后再发布，但希望以上内容对您有帮助

如何通过正则表达式分割python列表

2 个答案: