Python csv reader:如何使用命令行将输出传递给另一个脚本

时间:2014-03-02 03:30:18

标签: python csv command-line

我有2个脚本,一个映射器和一个reducer。两者都从csv阅读器获取输入。映射器脚本应从制表符分隔的文本文件dataset.csv中获取其输入,reducer的输入应该是映射器的输出。我想将reducer的输出保存到文本文件output.txt。执行此操作的正确命令链是什么?

映射器:

#/usr/bin/python

import sys, csv

reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for line in reader:
if len(line) > 5: # parse only lines in the forum_node.tsv file
    if line[5] == 'question':
        _id = line[0]
        student = line[3] # author_id
    elif line[5] != 'node_type':
        _id = line[7]
        student = line[3] # author_id
    else:
        continue # ignore header

    print '{0}\t{1}'.format(_id, student)

减速器:

#/usr/bin/python

import sys, csv

reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

oldID = None
students = []

for line in reader:
if len(line) != 2:
    continue

thisID, thisStudent = data

if oldID and oldID != thisID:
    print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))
    students = []

thisID = oldID
students.append(thisStudent)

if oldID != None:
print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))

1 个答案:

答案 0 :(得分:3)

将文件组合在一起:

python mapper.py < dataset.csv | python reducer.py > output.txt

< dataset.csvmapper.py上提供stdin CSV文件,|将stdout重定向到另一个推荐。另一个命令是python reducer.py> output.txt将该脚本中的stdout连接到`output.txt。