Question

我正在尝试将正确的结构化输出转换为csv。

输入：

00022d9064bc,1073260801,1073260803,819251,440006
00022d9064bc,1073260803,1073260810,819213,439954
00904b4557d3,1073260803,1073261920,817526,439458
00022de73863,1073260804,1073265410,817558,439525
00904b14b494,1073260804,1073262625,817558,439525
00022d1406df,1073260807,1073260809,820428,438735
00022d9064bc,1073260801,1073260803,819251,440006
00022dba8f51,1073260801,1073260803,819251,440006
00022de1c6c1,1073260801,1073260803,819251,440006
003065f30f37,1073260801,1073260803,819251,440006
00904b48a3b6,1073260801,1073260803,819251,440006
00904b83a0ea,1073260803,1073260810,819213,439954
00904b85d3cf,1073260803,1073261920,817526,439458
00904b14b494,1073260804,1073265410,817558,439525
00904b99499c,1073260804,1073262625,817558,439525
00904bb96e83,1073260804,1073265163,817558,439525
00904bf91b75,1073260804,1073263786,817558,439525

代码：

import pandas as pd
from datetime import datetime,time
import numpy as np

fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)

df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime

# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)

# building reporting DF: `r`
freq = '1H'  # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)

# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1

r['LogCount'] = 0
r['UniqueIDCount'] = 0

for i, row in r.iterrows():
        # intervals overlap test
        # https://en.wikipedia.org/wiki/Interval_tree#Overlap_test
        # i've slightly simplified the calculations of m and d
        # by getting rid of division by 2,
        # because it can be done eliminating common terms
    u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
    r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]

r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time

#df.to_csv((r[r.LogCount > 0])'example.csv')

#print(r[r.LogCount > 0]) -- This gives the correct count and unique count but I want to write the output in a structure.

print (r['StartTime'], ['EndTime'], ['Day'], ['LogCount'], ['UniqueIDCount'])

输出：这是我得到的输出，这不是我想要的。

(2004-01-05 00:00:00    00:00:00
2004-01-05 01:00:00    01:00:00
2004-01-05 02:00:00    02:00:00
2004-01-05 03:00:00    03:00:00
2004-01-05 04:00:00    04:00:00
2004-01-05 05:00:00    05:00:00
2004-01-05 06:00:00    06:00:00
2004-01-05 07:00:00    07:00:00
2004-01-05 08:00:00    08:00:00
2004-01-05 09:00:00    09:00:00

预期的输出标题是

StartTime, EndTime, Day, Count, UniqueIDCount

如何在代码中构造Write语句以在输出csv中包含上述列。

Answer 1

试试这个：

rout =  r[['StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount']  ]
print rout
rout.to_csv('results.csv', index=False)

构造输出

1 个答案: