我有一个大型csv,其中包含以下标题列id
,type
,state
,location
,number of students
以及以下值:
124, preschool, Pennsylvania, Pittsburgh, 1242
421, secondary school, Ohio, Cleveland, 1244
213, primary school, California, Los Angeles, 3213
155, secondary school, Pennsylvania, Pittsburgh, 2141
etc...
该文件未订购,我想要一个新的csv文件,其中包含所有学生人数超过2000的学校。
我找到的答案是关于有序的csv文件,或者在特定行数后拆分它们。
答案 0 :(得分:0)
以下是使用csv
模块的解决方案:
import csv
with open('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=True)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if int(i[-1]) > 2000:
writer.writerow(i)
结果:
id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141
答案 1 :(得分:0)
如果你只是想读取文件并避免任何其他处理,你可以使用正则表达式 - (假设这是最后一列,值是正整数) -
import re
f1 = open('Test1.txt','wb')
with open("Test.txt") as f:
for line in f:
match = re.search(r'[2-9][0-9]{3,}$', line)
if (match):
f1.write(line)
f1.close()
如果你在bash上做同样的事情会快得多 -
while read line; do
K='[2-9][0-9]{3,}$'
if [[ $line =~ $K ]] ; then echo $line; fi
done <Test.txt