Question

我有很多没有HTTP标头的URL。我正在尝试完成两件事：

读取没有HTTP标头exp（www.google.com）的URL的文本文件，并将其拆分为1000个大块文本文件。
在每个链接exp（http://www.google.com）上附加“ http：//”

目前，我只能完成第一步。

from itertools import zip_longest

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return zip_longest(fillvalue= fillvalue, *args)

n = 1000

with open('sites.txt') as f:
    for i, g in enumerate(grouper(n, f, fillvalue=''), 1):
        with open('s_{0}'.format(i), 'w') as fout:
            fout.writelines(g)

Answer 1

在每个链接exp（http://www.google.com）后面附加“ http：//”

如果您有一个URL列表，并且想在每个项目前面加上https://，则可以使用列表理解和字符串格式。

urls = ['https://{}'.format(url) for url in urls]

如果文件中包含这些文件，请在换行符上拆分文件以创建列表：

with open('sites.txt') as f:
    urls = ['https://{}'.format(url) for url in f.splitlines()]

**注意：您的问题与HTTP标头无关

Answer 2

假设我已经正确理解了这个问题（因为目前尚不清楚）...您可以将字符串很简单地附加到列表中的每个项目上：

def addtoeachitem(word, list):
    return [word+item for item in list]

与写作相同

def addtoeachitem(word, list):
    new = []
    for item in list:
        new.append(word+item)
    return new

很明显，此代码假定列表中的所有内容都是字符串，否则，将出错。根据您的需要对其进行调整。

如何将http附加到每个URL的文本文件

2 个答案: