**我的目标是避免导入csv模块
我正在处理一个脚本,该脚本运行一个非常大的csv文件,并有选择地将行写入新的csv文件。
我有以下两行:
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
for row in ifile:
然后这个,一些嵌套的if语句:
line = list(ifile)[row]
ofile.write(line)
我知道这是不对的 - 我对它进行了一次尝试,希望有人能够对如何正确地解决这个问题有所了解。这个问题的本质是如何引用我所在的行,以便我可以使用'ofile'将其写入新的csv文件。如果有必要进一步澄清,请与我们联系。谢谢!
编辑:完整代码包含在pastebin链接中 - http://pastebin.com/a0jx85xR
答案 0 :(得分:0)
你很亲密。这就是你要做的全部:
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
for row in ifile:
#...
#You've defined some_condition to be met (you will have to replace this for yourself)
#E.g.: the number of entries in each row is greater than 5:
if len([term for term in row.split('#') if term.strip() != '']) > 5:
ofile.write(row)
更新:
回答OP关于分割线的问题:
通过提供分隔符来在Python中分割一行。由于这是一个CSV文件,因此您可以按,
拆分该行。例如:
如果这是一行(字符串):
0, 1, 2, 3, 4, 5
如果您申请:
line.split(',')
您将获得列表:
['0', '1', '2', '3', '4', '5']
更新2:
import sys
if __name__ == '__main__':
ticker = sys.argv[3]
allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
all_timestamps = [] #this is an empty list
n_rows = 0
for row in ifile:
#This splits the line into constituent terms as described earlier
#SAMPLE LINE:
#A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
#After applying this bit of code, the line should be split into this:
#['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
#NOW, you can make comparisons against those terms. :)
terms = [term for term in row.split(',') if term.strip() != '']
current_timestamp = int(terms[2])
#compare the current against the previous
#starting from row 2: (index 1)
if n_rows > 1:
#Python uses circular indices, hence: -1 means the value at the last index
#That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
if current_timestamp - all_timestamp[-1] >= 0:
pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want
#increment n_rows every time:
n_rows += 1
#always append the current timestamp to all the time_stamps
all_timestamps.append(current_timestamp)
if (terms[6] == ticker):
# add something to make sure chronological order hasn't been broken
if (allTypes == 1):
ofile.write(row)
#I don't know if this was a bad indent of not, but you should know
#where this goes
elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
print row
ofile.write(row)
我最初的推测是正确的。 您没有将行拆分为CSV组件。因此,当您对行进行比较时,您没有得到正确的结果 - 因此,您没有获得任何输出。这应该工作了(根据你的目标稍作修改)。 :)
答案 1 :(得分:0)
只是添加到jrd1的答案。我很少使用csv模块,我只是在字符串上使用split和join方法。通常我最终得到这样的东西(如果只有一个输入和输出,我通常只使用stdin和stdout)。
import sys as sys
for row in sys.stdin:
fields = row.split(",") #Could be "\t" or whatever, default is whitespace
#process fields in someway (0 based indexing)
fields[0] = str(int(fields[0]) + 55)
fields[7] = new_date_format(fields[7])
if(some_condition_is_met):
print(",".join(fields))
当然,如果你的csv文件开始得到一些带引号和内部逗号等的时髦条目,那么这种方法将不会那么有趣