Python - 根据列标准从CSV文件中删除行

时间:2016-02-03 18:58:30

标签: python csv

每次状态发生变化时,我只需要返回每个'状态'的第一个。这是摘录,例如从这个数据集中我只需要行1,2,5,6,8,10,11,12,13,15,18,19,21。

Row,Serial Number,Time,Status 
1,1400004,3/10/2014 11:52,GREEN 
2,1400004,3/15/2014 11:45,YELLOW 
3,1400004,3/29/2014 7:59,YELLOW 
4,1400004,4/16/2014 15:59,YELLOW 
5,1400004,5/10/2014 8:18,GREEN 
6,1400004,5/11/2014 15:28,YELLOW 
7,1400004,5/24/2014 7:56,YELLOW 
8,1400004,5/26/2014 7:59,GREEN 
9,1400004,5/28/2014 8:26,GREEN 
10,1400004,6/13/2014 17:29,YELLOW 
11,1400004,6/15/2014 15:12,GREEN 
12,1400004,6/17/2014 8:57,YELLOW 
13,1400007,1/3/2014  11:55,GREEN 
14,1400007,1/18/2014 5:35,GREEN 
15,1400007,1/18/2014 18:32,YELLOW 
16,1400007,1/19/2014 21:50,YELLOW 
17,1400007,1/21/2014 10:56,YELLOW 
18,1400007,1/27/2014 8:15,GREEN 
19,1400007,2/6/2014  9:47,YELLOW 
20,1400007,2/12/2014 12:44,YELLOW 
21,1400007,2/18/2014 12:40,GREEN 
22,1400007,2/24/2014 12:08,YELLOW 

这是我的代码,我很接近,但有点偏。

import csv
with open('NEW2.csv', 'rb') as f:
    csv_input = csv.reader(f)
    entries = []
    for x in csv_input:
        if x[3] == csv_input.next()[3]:
            pass
        else:
            entries.append(x)
    print entries

3 个答案:

答案 0 :(得分:0)

尝试使用此尺寸:

import csv
import itertools
import operator

answer = []
with open('path/to/file') as infile:
    for k, group in itertools.groupby(operator.itemgetter(4), csv.reader(infile, delimiter='\t')):
        answer.append(next(group))

for row in answer: print '\t'.join(row)

答案 1 :(得分:0)

这应该这样做。先保留一个标记,然后将其更新为最近找到的行。如果当前状态与上一个状态不同,请将其添加到条目

with open('so_data.txt', 'r') as f:
        prev= None
        f.readline() #skip the first line
        entries = []
        for i, line in enumerate(f):
            curStat = line.split()[-1]
            if not prev or curStat != prev:
                entries.append(i+1)
                #entries.append(line) #for the line instead of the line number 
                prev = curStat
    print entries

答案 2 :(得分:0)

我会使用'pandas'来完成这项任务。 我们只需添加另一列:“prev_status”并仅打印以下行:previos_status!= current_status ...

import pandas as pd

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

df = pd.read_csv(
        'status.csv', 
        index_col=['row'],                  # use "row" column as index
        parse_dates=['time'],               # parse time as date/time
        names=['row','serial_number','time','status'],  # let's define column names
        skiprows=1,                         # skip header row
        converters={'status': strip}        # get rid of trailing whitespaces
)

# let's create a new column [prev_status]
# and fill it with the "previos" status
df['prev_status'] = df.status.shift(1)

#print(df)
print(df.ix[(df['status'] != df['prev_status'])])

我使用了“strip”转换器,因为在提供的CSV中有尾随空格。因此,如果您的CSV文件中没有尾随空格,则不需要“converter”参数,您可以删除“strip”功能。

输出:

     serial_number                time  status prev_status
row
1          1400004 2014-03-10 11:52:00   GREEN         NaN
2          1400004 2014-03-15 11:45:00  YELLOW       GREEN
5          1400004 2014-05-10 08:18:00   GREEN      YELLOW
6          1400004 2014-05-11 15:28:00  YELLOW       GREEN
8          1400004 2014-05-26 07:59:00   GREEN      YELLOW
10         1400004 2014-06-13 17:29:00  YELLOW       GREEN
11         1400004 2014-06-15 15:12:00   GREEN      YELLOW
12         1400004 2014-06-17 08:57:00  YELLOW       GREEN
13         1400007 2014-01-03 11:55:00   GREEN      YELLOW
15         1400007 2014-01-18 18:32:00  YELLOW       GREEN
18         1400007 2014-01-27 08:15:00   GREEN      YELLOW
19         1400007 2014-02-06 09:47:00  YELLOW       GREEN
21         1400007 2014-02-18 12:40:00   GREEN      YELLOW
22         1400007 2014-02-24 12:08:00  YELLOW       GREEN