我是Python的新手,我正在努力学习这一部分。文本文件中有大约25列,行数超过50,000列。对于其中一列#11( ZIP ),此列包含此格式的客户的所有邮政编码值" 07598-XXXX " ,我只想获得前5个,所以" 07598 ",我需要为整个专栏做这个,但我根据我当前的逻辑感到困惑怎么写呢。 到目前为止,我的代码能够删除包含某些字符串的行,而且我还使用了' |'用于将其格式化为CSV的分隔符。
国家| ZIP(#11)|第12栏| ....
NY | 60169-8547 | 98
NY | 60169-8973 | 58
NY | 11219-4598 | 25
NY | 11219-8475 | 12
NY | 20036-4879 | 56
如何遍历ZIP列并显示前5个字符? 谢谢你的帮助!
import csv
my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='|'):
if not any(remove_word in element for element in line for remove_word in remove_words):
writer.writerow(line)
答案 0 :(得分:3)
'{:.5}'.format(zip_)
其中zip_
是包含邮政编码的字符串。有关format
的更多信息,请访问:https://docs.python.org/2/library/string.html#format-string-syntax
答案 1 :(得分:2)
单独处理标题行,然后像往常一样逐行阅读,只需修改第二个line
列,即截断为5个字符。
import csv
my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
cr = csv.reader(infile, delimiter='|')
# iterate over title line and write it as-is
writer.writerow(next(cr))
for line in cr:
if not any(remove_word in element for element in line for remove_word in remove_words):
line[1] = line[1][:5] # truncate
writer.writerow(line)
或者,您可以使用line[1] = line[1].split("-")[0]
来保留短划线字符左侧的所有内容。
注意标题行的特殊处理:cr
是一个迭代器。我只是在for
循环之前手动使用它来执行传递处理。
答案 2 :(得分:1)
使用str[:6]
在你的情况下:
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='|'):
if not any(remove_word in element for element in line for remove_word in remove_words):
line[1] = line[1][:6]
writer.writerow(line)
line[1] = line[1][:6]
会将文件中的第二列设置为前5个字符。