Question

解决了允许我从网站提取图像和测试的循环问题后，当我尝试保存我在csv文件中的新行中提取的文本时，我遇到了另一个问题。

我正在做的是使用“描述”类搜索div，然后选择我感兴趣的文本，打印数据（检查每一项都是正确的），最后使用 writerow 提取数据（之前我打开文件并添加带标题的行）。

编辑：我的问题是它只保存了一行，即脚本搜索和提取的最后一行。我不知道我做错了什么。我将把脚本中的两个函数放在：

main（）是我之前说过的话。

def main(url, destino):
    """ Acceso al sitio web """
    soup = bs(urlopen(url), 'lxml')
    parsed = list(urlparse.urlparse(url))

    """ Acceso al archivo csv """
    fileName = 'datos/datos.csv'
    print fileName
    f = csv.writer(open(fileName, 'w'))
    f.writerow(["Lote", "Dato del lote", "Detalles"]) # Header

    """ Acceso a la descr. y escritura en el csv """
    description = soup.findAll(True, {'class':['description']})

    for text in description:
        loteNum = text.contents[1]
        loteDat = text.contents[3]
        detalle = text.contents[6]
        detalleE = detalle.encode("utf-8")
        print loteNum
        print loteDat
        print detalle
        f.writerow([loteNum, loteDat, detalleE])

    """ Descarga de las img. """
    for image in soup.findAll(True, {'class':['list_logo']}):
        print "Image: %(src)s" % image
        image_url = urlparse.urljoin(url, image['src'])
        filename = image["src"].split("/")[-1]
        outpath = os.path.join(destino, filename)
        urlretrieve(image_url, outpath)

getUrl（）允许我在我想要提取的确定范围的图像中工作。我放在这里是因为我不知道问题是否可以来自这个功能。

def getUrl(opt, baseUrl):
    destino = "/home/ivanhercaz/monedasWiki/img"
    print "Instrucciones del script \n No te preocupes, no es complicado pero atiende a los pasos"
    print "Introduce 1 para obtener los archivos del 00001 al 00010"
    print "Introduce 2 para obtener los archivos del 00010 al 00099"
    print "Introduce 3 para obtener los archivos del 00100 al 00999"
    print "Introduce 4 para obtener los archivos del 01000 al 09999"
    print "Introduce 5 para obtener los archivos del 10000 al 19999"
    optSel = int(input(opt))
    # i es el rango
    # urlI es la transformacion de i en cadena
    # baseUrl es el enlace al sitio web de Pliego
    # url es la url completa con los parametros necesarios
    if optSel == 1:
        try:
            for i in range(0,10):
                r = str(0).zfill(4)
                urlI = str(i)
                url = baseUrl + r + urlI
                main(url, destino)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel == 2:
        try:
            for i in range(10,100):
                r = str(0).zfill(3)
                urlI = str(i)
                url = baseUrl + r + urlI
                main(url, destino)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel == 3:
        try:
            for i in range(100,1000):
                r = str(0).zfill(2)
                urlI = str(i)
                url = baseUrl + r + urlI
                main(url, destino)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel == 4:
        try:
            for i in range(1000,10000):
                r = str(0).zfill(1)
                urlI = str(i)
                url = baseUrl + r + urlI
                main(url, destino)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel == 2:
        try:
            for i in range(10000,18510):
                urlI = str(i)
                url = baseUrl + r + urlI
                main(url, destino)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel < 0:
        print "Valor inferior a 0"
    else:
        print "Algo ha salido mal"

两个函数都在同一个文件中。如果你能告诉我它出了什么问题我将非常感激。

编辑：我已经改变了打开和编写文件的方式，就像Moses Koledoye评论的那样，但是脚本只是写了最后一个文本。我认为问题是与循环相关的东西来检查并添加带有文本的行，但我找不到解决它的方法。我再次分享main（）。

<!-- language: python -->
def main(url, destino):
    """ Acceso al sitio web """
    soup = bs(urlopen(url), 'lxml')
    parsed = list(urlparse.urlparse(url))

    """ Acceso al archivo csv """
    fileName = 'datos/datos.csv'
    print fileName

    """ Acceso a la descr. y escritura en el csv """
    description = soup.findAll(True, {'class':['description']})

    for text in description:
        loteNum = text.contents[1]
        loteDat = text.contents[3]
        detalle = text.contents[6]
        detalleE = detalle.encode("utf-8")
        print loteNum
        print loteDat
        print detalle
        header = ["Lote", "Dato del lote", "Detalles"]
        data = [loteNum, loteDat, detalleE]
        with open(fileName, 'w') as f:
            f = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
            f.writerow(header)
            f.writerow(data)

    """ Descarga de las img. """
    for image in soup.findAll(True, {'class':['list_logo']}):
        print "Image: %(src)s" % image
        image_url = urlparse.urljoin(url, image['src'])
        filename = image["src"].split("/")[-1]
        outpath = os.path.join(destino, filename)
        urlretrieve(image_url, outpath)

Answer 1

for text in description:
    # ... some functionality
    data = [loteNum, loteDat, detalleE]
    with open(fileName, 'w') as f:
        f = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
        f.writerow(header)
        f.writerow(data)

每次迭代description时，都会以w礼仪模式打开文件，覆盖之前的内容。

您可以将其（打开文件的模式）更改为a ppend，或者只打开文件以写入循环之外的，如下所示：< / p>

with open(fileName, 'w') as f:
    f = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
    header = ["Lote", "Dato del lote", "Detalles"]
    f.writerow(header)
    for text in description:
        loteNum, loteDat, detalle = [text.contents[i] for i in (1, 3, 6)]
        detalleE = detalle.encode("utf-8")
        print loteNum, loteDat, detalle
        data = [loteNum, loteDat, detalleE]
        f.writerow(data)

使用python将新行添加到csv文件中

1 个答案: