Question

我需要使用FINDALL来获取所有特定网页并将它们传递到数组中但只是没有引号的链接这是我到目前为止如果不是数组变量我可以传递到循环中的每个单独的链接，我可以逐个使用它们或一次性使用它们

#!/usr/bin/env python
import re,urllib,urllib2

Url = "http://www.ihiphopmusic.com/music"
print Url
print 'test .............'
req = urllib2.Request(Url)
print "1"
response = urllib2.urlopen(req)
print "2"
#reads the webpage
the_webpage = response.read()
#grabs the title
the_list = re.findall(r'number-link" href="(.*?)#comments">0</a>',the_webpage)
print "3"
the_list = the_list.split(',')
arrlist = array('c',the_list)
print arrlist

结果

http://www.ihiphopmusic.com/music
test .............
1
2
3
Traceback (most recent call last):
  File "grub.py", line 17, in <module>
    the_list = the_list.split(',')
AttributeError: 'list' object has no attribute 'split'

Answer 1

re.findall返回非重叠匹配列表。您正在尝试拆分列表，这就是您获得AttributeError的原因（list对象没有split方法）。我不确定你想要实现的目标。您想要拆分单个匹配并将它们存储在可迭代中吗？如果是这样，您可以执行以下操作：

import itertools
results = itertools.chain(*[x.split(',') for x in the_list])

Answer 2

从我能收集的内容（如果我错了，请纠正我），你已经在那里了:)正如@mgilson所指出的那样，它已经是一个清单：

#grabs the title
the_list = re.findall(r'number-link" href="(.*?)#comments">0</a>',the_webpage)
print "3"
print type(the_list)
print the_list

所以你可以通过迭代来做你想做的事情：

for item in the_list:
    print item

Answer 3

'split'是字符串对象的属性，而不是列表对象。 AttributeError源于尝试在列表上使用拆分。如果您打印the_list，您将看到它已经是一个列表。如果要拆分列表并在单独的行中显示每个URL，可以使用print '\n'.join(the_list)。

Python findall转换为数组

3 个答案: