在csv文件中删除第一列包含6位数的行

时间:2013-06-09 08:40:47

标签: python csv sed awk grep

我有一个像这样的csv文件

"5478",a,56.40,-0.40 ,55.50,57.50,55.30,56.74,"862,971","48,962,460","695",56.40,56.60,"127,474,332",56.40,60.30,52.50
"5480",b,21.90,-0.25 ,21.80,22.00,21.80,21.87,"1,598,041","34,950,597","590",21.90,21.95,"199,097,830",21.90,23.40,20.40
"70462P",c,0.01,-0.01 ,0.01,0.01,0.01,0.01,"99,000","990","1",0.01,0.06,"5,000,000",0.01,0.31,0.01
"70465P",d, ---,--- ,---,---,---,0.02,"0","0","0",0.01,0.03,"20,000,000",0.02,0.32,0.01
"8935",bt,5.02,-0.02 ,4.95,5.19,4.92,5.05,"949,102","4,791,070","290",5.02,5.07,"201,902,107",5.02,5.37,4.67
1333,tnd,21.40,-0.60 ,22.00,22.20,21.20,21.52,"1,519,292","32,692,804","631",21.40,21.50,"102,525,625",21.40,22.85,19.95

我想检查第一列,如果超过4位,则删除该行,例如,第2行和第3行将被删除。 我该怎么做呢?非常感谢

PS2 这是从库存中心下载的库存信息,但是我发现格式最近改变了,之前的格式是最后一行,第一列是没有引号“”,是否可以过滤这两种格式?或者我应该处理这两起案件?

5 个答案:

答案 0 :(得分:3)

这是一个sed解决方案:

sed -e '/^"[0-9]\{5\}/d' in-file > out-file

您还可以使用-i option

进行就地替换
sed -i -e '/^"[0-9]\{5\}/d' file

答案 1 :(得分:1)

不确定,你想用什么语言,因为你在标签中标记了awk和sed,但你可以简单地使用grep:

egrep '^\"[0-9]{1,4}\"' file.txt

答案 2 :(得分:1)

AWK

awk -F, '$1~ /^\"[0-9][0-9]?[0-9]?[0-9]?\"$/' file

GNU sed

sed '/^\"[0-9]\{1,4\}\"/!d' file

答案 3 :(得分:0)

import csv, tempfile, shutil

with open('data.csv', 'rb') as fin, \
     tempfile.NamedTemporaryFile(delete=False) as fout:
    w = csv.writer(fout)
    for row in csv.reader(fin):
        if len(row[0]) <= 4:
            w.writerow(row)

shutil.move(fout.name, 'data.csv')

答案 4 :(得分:0)

你可以这样做。我假设lines是一个迭代器,每个循环给出CSV文件中的一行。

for line in lines:
    if len(line.split(',')[0]) <= 4: #consider this line
        print line #process this line
    else:
        pass