Question

我最近才开始使用python，但我遇到了问题。

# function that tells how to read the urls and how to process the data the
# way I need it.


def htmlreader(i):
    # makes variable websites because it is used in a loop.
    pricedata = urllib2.urlopen(
        "http://website.com/" + (",".join(priceids.split(",")[i:i + 200]))).read()

    # here my information processing begins but that is fine.
    pricewebstring = pricedata.split("},{")
    # results in [[1234,2345,3456],[3456,4567,5678]] for example.
    array1 = [re.findall(r"\d+", a) for a in pricewebstring]

    # writes obtained array to my text file
    itemtxt2.write(str(array1) + '\n')

i = 0
while i <= totalitemnumber:
    htmlreader(i)
    i = i + 200

请参阅脚本中的注释。

这是一个循环，每次都会给我一个数组（由array1定义）。

因为我将它打印到txt文件，所以它会产生一个带有单独数组的txt文件。我需要一个大数组，所以它需要合并htmlreader（i）的结果。

所以我的输出是这样的：

[[1234,2345,3456],[3456,4567,5678]]
[[6789,4567,2345],[3565,1234,2345]]

但我想：

[[1234,2345,3456],[3456,4567,5678],[6789,4567,2345],[3565,1234,2345]]

我有什么想法可以解决这个问题吗？

Answer 1

由于你想要收集单个列表中的所有元素，你可以简单地将它们收集到另一个列表中，通过像这样展平它

def htmlreader(i, result):
    ...
    result.extend([re.findall(r"\d+", a) for a in pricewebstring])

i, result = 0, []
while i <= totalitemnumber:
    htmlreader(i, result)
    i = i + 200

itemtxt2.write(str(result) + '\n')

在这种情况下，re.findall（列表）创建的结果将添加到result列表中。最后，您将整个列表作为一个整体写入文件。

如果上面显示的方法令人困惑，那么就像这样更改

def htmlreader(i):
    ...
    return [re.findall(r"\d+", a) for a in pricewebstring]

i, result = 0, []
while i <= totalitemnumber:
    result.extend(htmlreader(i))
    i = i + 200

合并循环获得的列表

1 个答案: