我是Python的新手(我通过CodeAcademy课程学习了它)并且可以使用一些帮助来解决这个问题。
我有一个'TestingDeleteLines.txt'文件,大约有300行文字。现在,我试图让它从该文件中打印出10条随机行,然后删除这些行。
所以如果我的文件有10行:
Carrot
Banana
Strawberry
Canteloupe
Blueberry
Snacks
Apple
Raspberry
Papaya
Watermelon
我需要从这些行中随机挑出,告诉我它是随机挑选的蓝莓,胡萝卜,西瓜和香蕉,然后删除这些行。
问题是,当Python读取文件时,它会读取该文件,一旦它到达最后,它就不会返回并删除这些行。我目前的想法是,我可以将行写入列表,然后重新打开文件,将列表与文本文件匹配,如果找到匹配项,则删除行。
我目前的问题有两个:
random.sample
似乎不起作用,因为当我稍后使用每一行附加到URL时,我需要将这些行分开。我不觉得我的逻辑(写入数组 - >在文本文件中找到匹配 - >删除)是最理想的逻辑。有没有更好的方法来写这个?
import webbrowser
import random
"""url= 'http://www.google.com'
webbrowser.open_new_tab(url+myline)""" Eventually, I need a base URL + my 10 random lines opening in each new tab
def ShowMeTheRandoms():
x=1
DeleteList= []
lines=open('TestingDeleteLines.txt').read().splitlines()
for x in range(0,10):
myline=random.choice(lines)
print(myline) """debugging, remove later"""
DeleteList.append(myline)
x=x+1
print DeleteList """debugging, remove later"""
ShowMeTheRandoms()
答案 0 :(得分:4)
要点是:你不要从文件中“删除”,而是用新内容重写整个文件(或另一个文件)。规范的方法是逐行读取原始文件,将要保留的行写回临时文件,然后用新文件替换旧文件。
with open("/path/to/source.txt") as src, open("/path/to/temp.txt", "w") as dest:
for line in src:
if should_we_keep_this_line(line):
dest.write(line)
os.rename("/path/to/temp.txt", "/path/to/source.txt")
答案 1 :(得分:1)
list.pop怎么样 - 它会为您提供项目并一步更新列表。
--hot
答案 2 :(得分:1)
让我们假设您有一个存储在mywebsite.com/blog-post.html?post=post-alias
items
在此处,您将覆盖以前的文本文件,其中>>> items = ['a', 'b', 'c', 'd', 'e', 'f']
>>> choices = random.sample(items, 2) # select 2 items
>>> choices # here are the two
['b', 'c']
>>> for i in choices:
... items.remove(i)
...
>>> items # tee daa, no more b or c
['a', 'd', 'e', 'f']
的内容会以您的首选行结尾\ r \ n或\ n加入。 items
不会删除行结尾,因此如果您使用该方法,则无需添加自己的行结尾。
答案 3 :(得分:1)
我有一个文件,' TestingDeleteLines.txt',大概有300行文字。现在,我试图让它从该文件中打印出10条随机行,然后删除这些行。
#!/usr/bin/env python
import random
k = 10
filename = 'TestingDeleteLines.txt'
with open(filename) as file:
lines = file.read().splitlines()
if len(lines) > k:
random_lines = random.sample(lines, k)
print("\n".join(random_lines)) # print random lines
with open(filename, 'w') as output_file:
output_file.writelines(line + "\n"
for line in lines if line not in random_lines)
elif lines: # file is too small
print("\n".join(lines)) # print all lines
with open(filename, 'wb', 0): # empty the file
pass
O(n**2)
算法can be improved如有必要(您不需要它来处理输入等小文件)
答案 4 :(得分:1)
要从文件中选择随机行,您可以使用节省空间的单遍reservoir-sampling algorithm。要删除该行,您可以打印除所选行之外的所有内容:
#!/usr/bin/env python3
import fileinput
with open(filename) as file:
k = select_random_it(enumerate(file), default=[-1])[0]
if k >= 0: # file is not empty
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for i, line in enumerate(file):
if i != k: # keep line
print(line, end='') # stdout is redirected to filename
其中select_random_it()
implements the reservoir-sampling algorithm:
import random
def select_random_it(iterator, default=None, randrange=random.randrange):
"""Return a random element from iterator.
Return default if iterator is empty.
iterator is exhausted.
O(n)-time, O(1)-space algorithm.
"""
# from https://stackoverflow.com/a/1456750/4279
# select 1st item with probability 100% (if input is one item, return it)
# select 2nd item with probability 50% (or 50% the selection stays the 1st)
# select 3rd item with probability 33.(3)%
# select nth item with probability 1/n
selection = default
for i, item in enumerate(iterator, start=1):
if randrange(i) == 0: # random [0..i)
selection = item
return selection
从文件中打印k
个随机行并删除它们:
#!/usr/bin/env python3
import random
import sys
k = 10
filename = 'TestingDeleteLines.txt'
with open(filename) as file:
random_lines = reservoir_sample(file, k) # get k random lines
if not random_lines: # file is empty
sys.exit() # do nothing, exit immediately
print("\n".join(map(str.strip, random_lines))) # print random lines
delete_lines(filename, random_lines) # delete them from the file
其中reservoir_sample()
使用与select_random_it()
相同的算法,但允许选择k
个项而不是一个:
import random
def reservoir_sample(iterable, k,
randrange=random.randrange, shuffle=random.shuffle):
"""Select *k* random elements from *iterable*.
Use O(n) Algorithm R https://en.wikipedia.org/wiki/Reservoir_sampling
If number of items less then *k* then return all items in random order.
"""
it = iter(iterable)
if not (k > 0):
raise ValueError("sample size must be positive")
sample = list(islice(it, k)) # fill the reservoir
shuffle(sample)
for i, item in enumerate(it, start=k+1):
j = randrange(i) # random [0..i)
if j < k:
sample[j] = item # replace item with gradually decreasing probability
return sample
和delete_lines()
效用函数删除文件中选择的随机行:
import fileinput
import os
def delete_lines(filename, lines):
"""Delete *lines* from *filename*."""
lines = set(lines) # for amortized O(1) lookup
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
if line not in lines:
print(line, end='')
os.unlink(filename + '.bak') # remove backup if there is no exception
reservoir_sample()
,delete_lines()
功能不会将整个文件加载到内存中,因此它们可以用于任意大文件。
答案 5 :(得分:0)
也许您可以尝试使用
生成0到300之间的10个随机数deleteLineNums = random.sample(xrange(len(lines)), 10)
然后通过制作列表推导的副本从行数组中删除:
linesCopy = [line for idx, line in enumerate(lines) if idx not in deleteLineNums]
lines[:] = linesCopy
然后将行写回&#39; TestingDeleteLines.txt&#39;。
要了解上述复制代码的工作原理,这篇文章可能会有所帮助:
Remove items from a list while iterating
编辑:要获取随机生成的索引的行,只需执行以下操作:
actualLines = []
for n in deleteLineNums:
actualLines.append(lines[n])
然后,actualLines包含随机生成的行索引的实际行文本。
编辑:或者甚至更好,使用列表理解:
actualLines = [lines[n] for n in deleteLineNums]