我正在处理巨大的数据框:
reader = pd.read_csv("D:/...path.../test.csv", names=["id_easy","ordinal", "latitude", "longitude","epoch",'weekday'],
parse_dates=['epoch'], chunksize=n_rows, error_bad_lines=False)
day_names = (('0:00', '1:00'),('1:00', '2:00'),('2:00', '3:00'),('3:00', '4:00'),('4:00', '5:00'),('5:00', '6:00'),
('6:00', '7:00'),('7:00', '8:00'),('8:00', '9:00'),('9:00', '10:00'),('10:00', '11:00'),('11:00', '12:00'),
('12:00', '13:00'),('13:00', '14:00'),('14:00', '15:00'),('15:00', '16:00'),('16:00', '17:00'),('17:00', '18:00'),
('18:00', '19:00'),('19:00', '20:00'),('20:00', '21:00'),('21:00', '22:00'),('22:00', '23:00'),('23:00', '00:00'))
for df in reader:
if not df.empty:
df['epoch'] = pd.to_datetime(df.epoch,unit = 's')
df.index = pd.to_datetime(df.epoch)
for day in day_names:
day_df = df.between_time[day] # ERROR IS HERE
if not day_df.empty:
day_df.to_csv(f'{day}.csv', index=False, header=False, mode='a')
TypeError:“方法”对象不可下标
所需的输出是24个.csv文件,例如:final1,final2,...,final24
e35f652a 68 11.9125 3.7432 1465084811 Sunday
e35f652a 69 11.8992 3.7412 1465084870 Sunday
e35f652a 70 11.8866 3.7342 1465084930 Sunday
e35f652a 71 11.8755 3.7321 1465084990 Sunday
e35f652a 72 11.8675 3.7247 1465085050 Sunday
某种程度上this的问题或多或少相似
答案 0 :(得分:3)
因为DataFrame.between_time()
将用于索引的[]
更改为()
,并通过索引选择元组的第一个和第二个值:
for day in day_names:
day_df = df.between_time(day[0], day[1])
或更改循环以打开元组:
for s, e in day_names:
day_df = df.between_time(s, e)