构造输出

时间:2016-07-06 14:22:40

标签: python-2.7 csv pandas dataset output

我正在尝试将正确的结构化输出转换为csv。

输入:

00022d9064bc,1073260801,1073260803,819251,440006
00022d9064bc,1073260803,1073260810,819213,439954
00904b4557d3,1073260803,1073261920,817526,439458
00022de73863,1073260804,1073265410,817558,439525
00904b14b494,1073260804,1073262625,817558,439525
00022d1406df,1073260807,1073260809,820428,438735
00022d9064bc,1073260801,1073260803,819251,440006
00022dba8f51,1073260801,1073260803,819251,440006
00022de1c6c1,1073260801,1073260803,819251,440006
003065f30f37,1073260801,1073260803,819251,440006
00904b48a3b6,1073260801,1073260803,819251,440006
00904b83a0ea,1073260803,1073260810,819213,439954
00904b85d3cf,1073260803,1073261920,817526,439458
00904b14b494,1073260804,1073265410,817558,439525
00904b99499c,1073260804,1073262625,817558,439525
00904bb96e83,1073260804,1073265163,817558,439525
00904bf91b75,1073260804,1073263786,817558,439525

代码:

import pandas as pd
from datetime import datetime,time
import numpy as np

fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)

df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime

# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)

# building reporting DF: `r`
freq = '1H'  # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)

# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1

r['LogCount'] = 0
r['UniqueIDCount'] = 0

for i, row in r.iterrows():
        # intervals overlap test
        # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
        # i've slightly simplified the calculations of m and d
        # by getting rid of division by 2,
        # because it can be done eliminating common terms
    u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
    r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]

r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time

#df.to_csv((r[r.LogCount > 0])'example.csv')

#print(r[r.LogCount > 0]) -- This gives the correct count and unique count but I want to write the output in a structure.

print (r['StartTime'], ['EndTime'], ['Day'], ['LogCount'], ['UniqueIDCount'])

输出:这是我得到的输出,这不是我想要的。

(2004-01-05 00:00:00    00:00:00
2004-01-05 01:00:00    01:00:00
2004-01-05 02:00:00    02:00:00
2004-01-05 03:00:00    03:00:00
2004-01-05 04:00:00    04:00:00
2004-01-05 05:00:00    05:00:00
2004-01-05 06:00:00    06:00:00
2004-01-05 07:00:00    07:00:00
2004-01-05 08:00:00    08:00:00
2004-01-05 09:00:00    09:00:00

预期的输出标题是

StartTime, EndTime, Day, Count, UniqueIDCount

如何在代码中构造Write语句以在输出csv中包含上述列。

1 个答案:

答案 0 :(得分:1)

试试这个:

rout =  r[['StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount']  ]
print rout
rout.to_csv('results.csv', index=False)