在阅读文本文件时,我会看到奇怪的格式,其中日期和时间包含在不同的列中,如下所示(文件是制表符作为分隔符)。
temp
room 1
Date Time simulation
Fri, 01/Jan 00:30 11.94
01:30 12
02:30 12.04
03:30 12.06
04:30 12.08
05:30 12.09
06:30 11.99
07:30 12.01
08:30 12.29
09:30 12.46
10:30 12.35
11:30 12.25
12:30 12.19
13:30 12.12
14:30 12.04
15:30 11.96
16:30 11.9
17:30 11.92
18:30 11.87
19:30 11.79
20:30 12
21:30 12.16
22:30 12.27
23:30 12.3
Sat, 02/Jan 00:30 12.25
01:30 12.19
02:30 12.14
03:30 12.11
etc.
我想:
解析两列([0],[1]
)上的日期和时间;
提前30分钟转移所有时间戳,即用:30
替换:00
;
我使用了以下代码:
timeparse = lambda x: pd.datetime.strptime(x.replace(':30',':00'), '%H:%M')
df = pd.read_csv('Chart_1.txt',
sep='\t',
skiprows=1,
date_parser=timeparse,
parse_dates=['Time'],
header=1)
似乎解析时间而非日期(显然,这就是我告诉它要做的事情)。
此外,跳过行对于查找Date
和Time
标头很有用,但它会丢弃我需要的标头temp
和room 1
。
答案 0 :(得分:0)
您可以使用:
import pandas as pd
df = pd.read_csv('Chart_1.txt', sep='\t')
#get temperature to variable tempfrom third column
temp = df.columns[2]
print (temp)
Dry resultant temperature (°C)
#get aps to variable aps from second row and third column
aps = df.iloc[1, 2]
print (aps)
AE4854c_Campshill_openings reduced_communal areas increased openings2.aps
#create mask from first column - all values contains / - dates
mask = df.iloc[:, 0].str.contains('/',na=False)
#shift all rows to right NOT contain dates
df1 = df[~mask].shift(1, axis=1)
#get rows with dates
df2 = df[mask]
#concat df1 and df2, sort unsorted indexes
df = pd.concat([df1, df2]).sort_index()
#create new column names by assign
#first 3 are custom, other are from first row and fourth to end columns
df.columns = ['date','time','no name'] + df.iloc[0, 3:].tolist()
#remove first 2 row
df = df[2:]
#fill NaN values in column date by forward filling
df.date = df.date.ffill()
#convert column to datetime
df.date = pd.to_datetime(df.date, format='%a, %d/%b')
#replace 30 minutes to 00
df.time = df.time.str.replace(':30', ':00')
print (df.head())
date time no name 3F_T09_SE_SW_Bed1 GF_office_S GF_office_W_tea \
2 1900-01-01 00:00 11.94 11.47 14.72 16.66
3 1900-01-01 01:00 12.00 11.63 14.83 16.69
4 1900-01-01 02:00 12.04 11.73 14.85 16.68
5 1900-01-01 03:00 12.06 11.80 14.83 16.65
6 1900-01-01 04:00 12.08 11.84 14.79 16.62
GF_Act.Room GF_Communal areas GF_Reception GF_Ent Lobby ... \
2 17.41 12.74 12.93 10.85 ...
3 17.45 12.74 13.14 11.00 ...
4 17.44 12.71 13.23 11.09 ...
5 17.41 12.68 13.27 11.16 ...
6 17.36 12.65 13.28 11.21 ...
2F_S01_SE_SW_Bedroom 2F_S01_SE_SW_Int Circ 2F_S01_SE_SW_Storage_int circ \
2 12.58 12.17 12.54
3 12.64 12.22 12.49
4 12.68 12.27 12.48
5 12.70 12.30 12.49
6 12.71 12.31 12.51
GF_G01_SE_SW_Bedroom GF_G01_SE_SW_Storage_Bed 3F_T09_SE_SW_Bathroom \
2 14.51 14.61 11.49
3 14.55 14.59 11.50
4 14.56 14.59 11.52
5 14.55 14.58 11.54
6 14.54 14.57 11.56
3F_T09_SE_SW_Circ 3F_T09_SE_SW_Storage_int circ GF_Lounge GF_Cafe
2 11.52 11.38 12.83 12.86
3 11.56 11.35 13.03 13.03
4 11.61 11.36 13.13 13.13
5 11.65 11.39 13.17 13.17
6 11.68 11.42 13.18 13.18
[5 rows x 31 columns]