每次状态发生变化时,我只需要返回每个'状态'的第一个。这是摘录,例如从这个数据集中我只需要行1,2,5,6,8,10,11,12,13,15,18,19,21。
Row,Serial Number,Time,Status
1,1400004,3/10/2014 11:52,GREEN
2,1400004,3/15/2014 11:45,YELLOW
3,1400004,3/29/2014 7:59,YELLOW
4,1400004,4/16/2014 15:59,YELLOW
5,1400004,5/10/2014 8:18,GREEN
6,1400004,5/11/2014 15:28,YELLOW
7,1400004,5/24/2014 7:56,YELLOW
8,1400004,5/26/2014 7:59,GREEN
9,1400004,5/28/2014 8:26,GREEN
10,1400004,6/13/2014 17:29,YELLOW
11,1400004,6/15/2014 15:12,GREEN
12,1400004,6/17/2014 8:57,YELLOW
13,1400007,1/3/2014 11:55,GREEN
14,1400007,1/18/2014 5:35,GREEN
15,1400007,1/18/2014 18:32,YELLOW
16,1400007,1/19/2014 21:50,YELLOW
17,1400007,1/21/2014 10:56,YELLOW
18,1400007,1/27/2014 8:15,GREEN
19,1400007,2/6/2014 9:47,YELLOW
20,1400007,2/12/2014 12:44,YELLOW
21,1400007,2/18/2014 12:40,GREEN
22,1400007,2/24/2014 12:08,YELLOW
这是我的代码,我很接近,但有点偏。
import csv
with open('NEW2.csv', 'rb') as f:
csv_input = csv.reader(f)
entries = []
for x in csv_input:
if x[3] == csv_input.next()[3]:
pass
else:
entries.append(x)
print entries
答案 0 :(得分:0)
尝试使用此尺寸:
import csv
import itertools
import operator
answer = []
with open('path/to/file') as infile:
for k, group in itertools.groupby(operator.itemgetter(4), csv.reader(infile, delimiter='\t')):
answer.append(next(group))
for row in answer: print '\t'.join(row)
答案 1 :(得分:0)
这应该这样做。先保留一个标记,然后将其更新为最近找到的行。如果当前状态与上一个状态不同,请将其添加到条目
with open('so_data.txt', 'r') as f:
prev= None
f.readline() #skip the first line
entries = []
for i, line in enumerate(f):
curStat = line.split()[-1]
if not prev or curStat != prev:
entries.append(i+1)
#entries.append(line) #for the line instead of the line number
prev = curStat
print entries
答案 2 :(得分:0)
我会使用'pandas'来完成这项任务。 我们只需添加另一列:“prev_status”并仅打印以下行:previos_status!= current_status ...
import pandas as pd
def strip(text):
try:
return text.strip()
except AttributeError:
return text
df = pd.read_csv(
'status.csv',
index_col=['row'], # use "row" column as index
parse_dates=['time'], # parse time as date/time
names=['row','serial_number','time','status'], # let's define column names
skiprows=1, # skip header row
converters={'status': strip} # get rid of trailing whitespaces
)
# let's create a new column [prev_status]
# and fill it with the "previos" status
df['prev_status'] = df.status.shift(1)
#print(df)
print(df.ix[(df['status'] != df['prev_status'])])
我使用了“strip”转换器,因为在提供的CSV中有尾随空格。因此,如果您的CSV文件中没有尾随空格,则不需要“converter”参数,您可以删除“strip”功能。
输出:
serial_number time status prev_status
row
1 1400004 2014-03-10 11:52:00 GREEN NaN
2 1400004 2014-03-15 11:45:00 YELLOW GREEN
5 1400004 2014-05-10 08:18:00 GREEN YELLOW
6 1400004 2014-05-11 15:28:00 YELLOW GREEN
8 1400004 2014-05-26 07:59:00 GREEN YELLOW
10 1400004 2014-06-13 17:29:00 YELLOW GREEN
11 1400004 2014-06-15 15:12:00 GREEN YELLOW
12 1400004 2014-06-17 08:57:00 YELLOW GREEN
13 1400007 2014-01-03 11:55:00 GREEN YELLOW
15 1400007 2014-01-18 18:32:00 YELLOW GREEN
18 1400007 2014-01-27 08:15:00 GREEN YELLOW
19 1400007 2014-02-06 09:47:00 YELLOW GREEN
21 1400007 2014-02-18 12:40:00 GREEN YELLOW
22 1400007 2014-02-24 12:08:00 YELLOW GREEN