自定义datetimeparsing在阅读csv - Pandas后结合日期和时间

时间:2016-11-02 14:56:49

标签: csv parsing datetime pandas time

在阅读文本文件时,我会看到奇怪的格式,其中日期和时间包含在不同的列中,如下所示(文件是制表符作为分隔符)。

        temp
        room 1
Date    Time    simulation
Fri, 01/Jan 00:30   11.94
    01:30   12
    02:30   12.04
    03:30   12.06
    04:30   12.08
    05:30   12.09
    06:30   11.99
    07:30   12.01
    08:30   12.29
    09:30   12.46
    10:30   12.35
    11:30   12.25
    12:30   12.19
    13:30   12.12
    14:30   12.04
    15:30   11.96
    16:30   11.9
    17:30   11.92
    18:30   11.87
    19:30   11.79
    20:30   12
    21:30   12.16
    22:30   12.27
    23:30   12.3
Sat, 02/Jan 00:30   12.25
    01:30   12.19
    02:30   12.14
    03:30   12.11
etc.

我想:

  • 解析两列([0],[1])上的日期和时间;

  • 提前30分钟转移所有时间戳,即用:30替换:00;

我使用了以下代码:

timeparse = lambda x: pd.datetime.strptime(x.replace(':30',':00'), '%H:%M')

df = pd.read_csv('Chart_1.txt',
    sep='\t',
    skiprows=1,
    date_parser=timeparse,
    parse_dates=['Time'],
    header=1)

似乎解析时间而非日期(显然,这就是我告诉它要做的事情)。 此外,跳过行对于查找DateTime标头很有用,但它会丢弃我需要的标头temproom 1

1 个答案:

答案 0 :(得分:0)

您可以使用:

import pandas as pd


df = pd.read_csv('Chart_1.txt', sep='\t')
#get temperature to variable tempfrom third column
temp = df.columns[2]
print (temp)
Dry resultant temperature (°C)

#get aps to variable aps from second row and third column
aps = df.iloc[1, 2]
print (aps)
AE4854c_Campshill_openings reduced_communal areas increased openings2.aps

#create mask from first column - all values contains / - dates
mask = df.iloc[:, 0].str.contains('/',na=False)
#shift all rows to right NOT contain dates
df1 = df[~mask].shift(1, axis=1)
#get rows with dates
df2 = df[mask]
#concat df1 and df2, sort unsorted indexes
df = pd.concat([df1, df2]).sort_index()
#create new column names by assign
#first 3 are custom, other are from first row and fourth to end columns 
df.columns = ['date','time','no name'] + df.iloc[0, 3:].tolist()
#remove first 2 row
df = df[2:]
#fill NaN values in column date by forward filling
df.date = df.date.ffill()
#convert column to datetime
df.date = pd.to_datetime(df.date, format='%a, %d/%b')
#replace 30 minutes to 00
df.time = df.time.str.replace(':30', ':00')
print (df.head())
       date   time no name 3F_T09_SE_SW_Bed1 GF_office_S GF_office_W_tea  \
2 1900-01-01  00:00   11.94             11.47       14.72           16.66   
3 1900-01-01  01:00   12.00             11.63       14.83           16.69   
4 1900-01-01  02:00   12.04             11.73       14.85           16.68   
5 1900-01-01  03:00   12.06             11.80       14.83           16.65   
6 1900-01-01  04:00   12.08             11.84       14.79           16.62   

  GF_Act.Room GF_Communal areas GF_Reception GF_Ent Lobby   ...    \
2       17.41             12.74        12.93        10.85   ...     
3       17.45             12.74        13.14        11.00   ...     
4       17.44             12.71        13.23        11.09   ...     
5       17.41             12.68        13.27        11.16   ...     
6       17.36             12.65        13.28        11.21   ...     

  2F_S01_SE_SW_Bedroom 2F_S01_SE_SW_Int Circ 2F_S01_SE_SW_Storage_int circ  \
2                12.58                 12.17                         12.54   
3                12.64                 12.22                         12.49   
4                12.68                 12.27                         12.48   
5                12.70                 12.30                         12.49   
6                12.71                 12.31                         12.51   

  GF_G01_SE_SW_Bedroom GF_G01_SE_SW_Storage_Bed 3F_T09_SE_SW_Bathroom  \
2                14.51                    14.61                 11.49   
3                14.55                    14.59                 11.50   
4                14.56                    14.59                 11.52   
5                14.55                    14.58                 11.54   
6                14.54                    14.57                 11.56   

  3F_T09_SE_SW_Circ 3F_T09_SE_SW_Storage_int circ GF_Lounge GF_Cafe  
2             11.52                         11.38     12.83   12.86  
3             11.56                         11.35     13.03   13.03  
4             11.61                         11.36     13.13   13.13  
5             11.65                         11.39     13.17   13.17  
6             11.68                         11.42     13.18   13.18  

[5 rows x 31 columns]