我有一个包含8列的文本文件。第一个是ID,第八个是类型。在第一列中,每个ID有许多重复行,但在第8列中,每个ID有多种类型,一种类型为H,每个ID只有一个H.
ID type
E0 B
E0 H
E0 S
B4 B
B4 H
我想创建另一个文件,其中每个ID只有一行(只有第8列中有H的行)。这个例子是这样的:
ID type
E0 H
B4 H
答案 0 :(得分:0)
刚刚更新了用于Python 2.7.3的inspectorG4dget解决方案:
只考虑输入csv文件中的两列ID
和type
\t
代码:
import csv
with open('/home/vivek/Desktop/input.csv', 'rb') as infile, open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
reader_row = next(reader)
writer.writerow([reader_row[0], reader_row[1]])
for row in reader:
if row[1]=="H":
writer.writerow(row)
输出:
ID type
E0 H
B4 H
检查以下2.6.6我没有测试python 2.6.6的以下代码,因为我的机器上有python 2.7.3。
with open('/home/vivek/Desktop/input.csv', 'rb') as infile:
with open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
reader_row = next(reader)
writer.writerow([reader_row[0], reader_row[1]])
for row in reader:
if row[1]=="H":
writer.writerow(row)
答案 1 :(得分:0)
假设您的文件只是一个文本文件,其中的空格/制表符分隔了列,并且该列包含'键入'就在行的末尾:
with open('input.txt', 'r') as input_file:
input_lines = input_file.readlines()
# Take the header line, and all the subsequent lines whose last character is 'H'
output_lines = input_lines[:1] + [line for line in input_lines if line[-2] == 'H']
output_string = ''.join(output_lines)
with open('output.txt', 'w') as output_file:
output_file.write(output_string)
以上代码假定'类型'列在单字符类型代码之后立即结束。如果数据后面可能有空格,或者您可以使用多字符类型代码,这些代码可能看起来像是' AH'等,然后用以下内容替换评论下面的行:
output_lines = input_lines[:1] + [line for line in input_lines if line.split()[-1] == 'H']
编辑:如果您的文件太大并且您不想将其全部加载到内存中并进行操作,则可以使用生成器表达式,该表达式会被懒惰地评估:< / p>
with open('input.txt', 'r') as input_file:
output_lines = (line for i, line in enumerate(input_lines)
if line[-2] == 'H' or i == 0)
with open('output.txt', 'w') as output_file:
for line in output_lines:
output_file.write(line)