Question

我试图从网站上获取特定名称的列表，以最终将它们输出到文件中。我正在解析的网站是this one

代码当然不是完美的，但是令我感到困惑的是，为什么每组名称之间的输出中都有空格？

Error: Class 'Drupal\Core\Form\FormErrorHandler' not found in Drupal\Component\DependencyInjection\Container->createService() (line 258 of core\lib\Drupal\Component\DependencyInjection\Container.php).

如何消除输出中的空格？ array.append方法是否正常？将所有内容存储在文件中的最佳方法是什么？提前致谢！

Answer 1

在函数fetch_name()中，您正在向数组添加空字符串（在没有任何名称的页面上，<ul class="arrow">是空字符串）。如果您可以将其过滤掉，那么使用simple命令将显示所有没有空格的名称：

from urllib.request import urlopen
from bs4 import BeautifulSoup as bS
import re

# get the internals links

def get_internals():
    array=[]
    html = urlopen("http://www.prenom-marocain.com")
    soup = bS(html,"lxml")
    azlinks = soup.find("nav", {"class":"page-nav"}).findAll("a", {"href":re.compile("^p.*$")})
    for links in azlinks:
        array.append(links.attrs['href'])
    return array

# The function for fetching the names

def fetch_name(url):
    array=[]
    html = urlopen("http://www.prenom-marocain.com/"+url)
    soup = bS(html, "lxml")
    for child in soup.findAll("ul", {"class":"arrow"}):
        if not child.text.strip():
            break
        array.append(child.text.strip())
    return array

alpha_array = get_internals()

first_names=[]
for links in alpha_array:
    first_names += (fetch_name(links))

for names in first_names:
    print(names)

为什么我的数组输出中有空格

1 个答案: