Question

我正在尝试使用pandas和strip值导入一个空格分隔的.dat文件来创建日期。数据看起来像这样（从整个数据集中取出三行数据供参考）：

2.0140000e+003  1.0000000e+000  1.0000000e+000  0.0000000e+000  0.0000000e+000 0.0000000e+000  2.7454583e+000  1.8333542e+002 -3.3580352e+001
2.0140000e+003  1.0000000e+000  2.0000000e+000  0.0000000e+000  0.0000000e+000  0.0000000e+000 -6.1330625e+000  2.5187292e+002 -1.3752231e+001
2.0140000e+003  1.0000000e+000  3.0000000e+000  0.0000000e+000  0.0000000e+000  0.0000000e+000 -3.0905729e+001  2.1295208e+002 -2.4507273e+001

前六个数字构成日期（年，月，日，小时，分钟，秒）。

我可以使用以下方法导入数据：

df = pd.read_csv('daily.dat', sep='\s+', header=None)

它被分开了。

但是，我想将行的前六个条目删除为日期。例如，从第一行开始，前六个数字（或导入到df后的前六列）应该是：

2014-01-01 00:00:00

帮助？

Answer 1

演示：

当您读取没有列名（标题）的CSV / dat文件时，您将获得一个带有数字列名称的DF，如下所示：

In [139]: df
Out[139]:
        0    1    2    3    4    5          6          7          8
0  2014.0  1.0  1.0  0.0  0.0  0.0   2.745458  183.33542 -33.580352
1  2014.0  1.0  2.0  0.0  0.0  0.0  -6.133063  251.87292 -13.752231
2  2014.0  1.0  3.0  0.0  0.0  0.0 -30.905729  212.95208 -24.507273

列：

In [140]: df.columns
Out[140]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')

pd.to_datetime可以从多列组合日期时间：

从DataFrame的多个列组装日期时间。按键可以是常见的缩写，如[‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]）或复数的相同

In [141]: cols = ['year','month','day','hour','minute','second']

In [142]: df['date'] = pd.to_datetime(df.iloc[:, :6].rename(columns=lambda c: cols[c]))

删除前6列：

In [143]: df = df.iloc[:, 6:]

In [144]: df
Out[144]:
           6          7          8       date
0   2.745458  183.33542 -33.580352 2014-01-01
1  -6.133063  251.87292 -13.752231 2014-01-02
2 -30.905729  212.95208 -24.507273 2014-01-03

或者（thanks @Idlehands for the idea）我们可以这样放弃它：

df = df.drop(columns=df.columns[:6])

Answer 2

你可以试试这个：

import pandas as pd
from datetime import datetime

df = pd.read_csv('daily.dat', sep='\s+', header=None)

def to_datetime(year,month,day,hour,minute,second):
    return datetime(int(year),int(month),int(day),int(hour),int(minute),int(second))

df['datetime'] = df.apply(lambda x: to_datetime(x[0], x[1], x[2], x[3], x[4], x[5]), axis=1).apply(str)

df.drop(list(range(6)),1,inplace=True)

print(df)

# output:
#           6          7          8             datetime
#0   2.745458  183.33542 -33.580352  2014-01-01 00:00:00
#1  -6.133063  251.87292 -13.752231  2014-01-02 00:00:00
#2 -30.905729  212.95208 -24.507273  2014-01-03 00:00:00

从空间分隔.dat文件获取datetime - python / pandas

2 个答案: