我正在使用python中的一个脚本,我似乎无法做对。它使用两个输入:
数据文件由4个以制表符分隔的列组成,这些列已排序。 停止文件由也排序的单词列表组成。
该脚本的目标是:
以下是数据文件的示例:
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
以下是停止文件的示例:
apple-n
banana-n
cake-n
pigeon-n
这是我到目前为止的代码:
with open("input1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
#print lemma
with open ("input2", "rb") as oSenseFile:
with open("output", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
#print concept
if concept != lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
所需输出如下:
abandonment-n after+n-the+n-a-j-stop-n 1
abandonment-n against+n-the+ns-leave-n 1
abandonment-n as+n-a+vd-require-v 1
abandonment-n as+n-a-j+vg-up-use-v 1
有什么见解?
截至目前,我得到的输出如下,这基本上只是我一直在做的打印:
abandonment-n after+n-the+n-a-j stop-n 1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1
***我尝试过的一些事情仍然无效:
而不是if concept != lemma:
我首先尝试if concept not in lemma:
产生与前面提到的相同的输出。
我也怀疑该函数没有调用第一个输入文件,但即使将其合并到代码中:如下:
with open ("input2", "rb") as oSenseFile:
with open("tinput1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
with open("out", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if concept not in lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
生成空白输出文件。
我也尝试了一种不同的方法:
filename = "input1.txt"
filename2 = "input2.txt"
filename3 = "output1"
def fixup(filename):
fin1 = open(filename)
fin2 = open(filename2, "r")
fout = open(filename3, "w")
for word in filename:
words = word.split()
for line in filename2:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if True in [concept in line for word in toRemove]:
pass
else:
outstring = '\t'.join(nounsInterest)
fout.write(outstring + '\n')
fin1.close()
fin2.close()
fout.close()
改编自here,但没有成功。在这种情况下,根本不会产生输出。
有人能指出我解决这个问题的方向吗? 虽然示例文件很小,但我必须在大文件上运行它。 感谢您的帮助。
答案 0 :(得分:4)
我认为你正在尝试做这样的事情
with open('input1', 'rb') as indexfile:
lemma = {x.strip() for x in indexfile}
with open('input2', 'rb') as sensefile, open('output', 'wb') as outfile:
for line in sensefile:
nouns_interest = concept, slot, filler, freq = line.split()
if concept not in lemma:
outfile.write('\t'.join(nouns_interest) + '\n')
您想要的输出似乎是在slot
和filler
之间插入一个连字符,因此您可能想要使用
outfile.write('{}\t{}-{}\t{}\n'.format(*nouns_interest))
答案 1 :(得分:1)
我还没有检查过你的逻辑,但是你为那里的每一行覆盖了lemma
。也许将它附加到列表中?
lemma = []
for line in oIndexFile:
lemma.append(line.strip()) #strips everything except the text
或者,正如@gnibbler建议的那样,您可以出于效率的原因使用set:
lemma = set()
for line in oIndexFile:
lemma.add(line.strip())
编辑:看起来您不想拆分它,但剥离换行符。是的,你的逻辑几乎是正确的
这就是第二部分应该是这样的:
with open ("data_php.txt", "rb") as oSenseFile:
with open("out_FILTER_LINES", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
#print concept
if concept not in lemma: #check if the concept exists in lemma
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass
答案 2 :(得分:1)
如果您确定数据文件中的行没有以空格开头,那么我们不需要拆分该行。这是对@gnibbler回答的轻微调整。
with open('input1', 'rb') as indexfile:
lemma = {x.strip() for x in indexfile}
with open('input2', 'rb') as sensefile, open('output', 'wb') as outfile:
for line in sensefile:
if not any([line.startswith(x) for x in lemma]):
outfile.write(line)