我有一个文件,其中包含以下格式的数据:
saveAs
我想将每n行分别视为一个元素。因此,在每行仅包含League\Flysystem\WebDAV\
之后,我要处理以下字符串和url(url的数量有所不同)。我用第一个字符串的名称创建一个文件夹,然后从URL下载文件。
我正在使用以下行来获取行列表。
Foo
http://url.com
http://url2.com
FooBar
http://url3.com
FooBarBar
http://url9.com
现在我正在考虑将其放在列表列表中,其中\n
用作分隔符元素。
我如何实现自己想要的?
答案 0 :(得分:1)
这种情况下您不应该一行,因为您没有关闭文件:
with open('C:\\filename.txt', 'r') as f:
result = [] # This will keep track of the final output
entry = [] # This will be our temporary entry that we will add to the result
for line in f.readlines():
line = line.strip() # remove the new line stuff
if not line and entry: # If it is not an empty line and our entry actually has stuff in it
result.append(' '.join(entry))
entry = []
else:
entry.append(line)
if entry:
result.append(' '.join(entry)) # Add the last entry.
print(result)
输出:
['Foo http://url.com http://url2.com', ' FooBar http://url3.com', 'FooBarBar http://url9.com']
答案 1 :(得分:0)
您应该能够遍历文件中的各行,并分别处理每种情况。
def urlsFromFile(path):
files = {}
with open(path) as f: # Important to use with here to ensure file is closed after reading
fileName = None
for line in f.readlines():
line = line.rstrip('\n') # Remove \n from end of line
if not line: # If the line is empty reset the fileName
fileName = None
elif fileName is None: # If fileName is None, then we previously reached a new line. Set the new fileName
fileName = line
files[fileName] = []
else: # We are working through the urls
files[fileName].append(line)
return files
print(urlsFromFile('filename.txt'))
输出:
{'FooBar': ['http://url3.com'], 'Foo': ['http://url.com', 'http://url2.com'], 'FooBarBar': ['http://url9.com']}
这将允许您使用结果来创建目录并下载每个列表中的文件,例如:
for folder, urls in urlsFromFile('filename.txt').items():
print('create folder {}'.format(folder))
for url in urls:
print('download {} to folder {}'.format(url, folder))
输出:
create folder FooBar
download http://url3.com to folder FooBar
create folder Foo
download http://url.com to folder Foo
download http://url2.com to folder Foo
create folder FooBarBar
download http://url9.com to folder FooBarBar
答案 2 :(得分:0)
如果格式一致,那么您可以简单地读取整个文件,然后根据需要拆分字符串。
代码
with open('C:\\filename.txt') as fobj:
elements = [block.split('\n') for block in fobj.read().split('\n\n')]
元素:=
[['Foo', 'http://url.com', 'http://url2.com'],
['FooBar', 'http://url3.com'],
['FooBarBar', 'http://url9.com']]
EXPLANATION
我总是建议使用上下文管理器(with
语句)。因为在数据流处理方面更安全。
我们在这里分为三层:
fobj.read()
.split('\n\n')
block.split('\n')
答案 3 :(得分:0)
根据要求的迭代方法“使用第一个字符串的名称创建一个文件夹,然后从url下载文件。” :
import os
with open('input.txt') as f:
folder_name = None
folder_failed = False
for line in f:
line = line.strip()
if line:
if not line.startswith('http'):
try:
os.mkdir(os.path.join(os.getcwd(), line))
folder_name = line
except OSError:
print(f"Creation of the directory `{line}` failed")
folder_failed = True
else:
folder_failed = False
elif not folder_failed:
# downloading file
new_file = download_file_from_url(line) # replace with your custom function
# save file into a folder `folder_name`