Question

我有一个看起来像这样的CSV文件：

time, Numbers
[30/Apr/1998:21:30:17,24736
[30/Apr/1998:21:30:53,24736
[30/Apr/1998:21:31:12,24736
[30/Apr/1998:21:31:19,3781
[30/Apr/1998:21:31:22,-
[30/Apr/1998:21:31:27,24736
[30/Apr/1998:21:31:29,-
[30/Apr/1998:21:31:29,-
[30/Apr/1998:21:31:32,929
[30/Apr/1998:21:31:43,-
[30/Apr/1998:21:31:44,1139
[30/Apr/1998:21:31:52,24736
[30/Apr/1998:21:31:52,3029
[30/Apr/1998:21:32:06,24736
[30/Apr/1998:21:32:16,-
[30/Apr/1998:21:32:16,-
[30/Apr/1998:21:32:17,-
[30/Apr/1998:21:32:30,14521
[30/Apr/1998:21:32:33,11324
[30/Apr/1998:21:32:35,24736
[30/Apr/1998:21:32:3l8,671
[30/Apr/1998:21:32:38,1512
[30/Apr/1998:21:32:38,1136
[30/Apr/1998:21:32:38,1647
[30/Apr/1998:21:32:38,1271
[30/Apr/1998:21:32:52,5933
[30/Apr/1998:21:32:58,-
[30/Apr/1998:21:32:59,231
upto one billion,

忘记数字列，我担心将CSV文件中的此日期格式转换为熊猫时间戳，因此我可以绘制数据集并根据时间对其进行可视化，因为我是数据科学领域的新手，这是我的方法：

step 1: take all the time colum from my CSV file into an array,
step 2: split the data from the mid where :(colon) occurs, make two new arrays of date and time,
step 3: remove "[" from date array,
step 4: replace all forward slash into dashes in the date array,
step 5: and then append date and time array to make a single pandas format,

看起来像这样，2017-03-22 15:16:45就像您所知道的，我是新手，我的做法既幼稚又错误，如果有人可以帮助我提供代码段，我将非常高兴，谢谢< / p>

Answer 1

您可以将格式传递给window.onload，在这种情况下：pd.to_datetime()。请注意错误数据，但请注意以下示例数据的第3行（[30 / Apr / 1998：21：32：3l8,671）。为了不出错，您可以传递[%d/%b/%Y:%H:%M:%S，并返回Not Time（NaT）。

另一种方法是手动替换这些行，或者先编写某种正则表达式/替换功能。

errors=coerce

返回：

import pandas as pd

data = '''\
time, Numbers
[30/Apr/1998:21:30:17,24736
[30/Apr/1998:21:30:53,24736
[30/Apr/1998:21:32:3l8,671
[30/Apr/1998:21:32:38,1512
[30/Apr/1998:21:32:38,1136       
[30/Apr/1998:21:32:58,-      
[30/Apr/1998:21:32:59,231'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep=',', na_values=['-'])

df['time'] = pd.to_datetime(df['time'], format='[%d/%b/%Y:%H:%M:%S', errors='coerce')
print(df)

请注意：此处使用time Numbers 0 1998-04-30 21:30:17 24736.0 1 1998-04-30 21:30:53 24736.0 2 NaT 671.0 3 1998-04-30 21:32:38 1512.0 4 1998-04-30 21:32:38 1136.0 5 1998-04-30 21:32:58 NaN 6 1998-04-30 21:32:59 231.0来帮助熊猫了解Numbers列实际上是数字而不是字符串。

现在我们可以执行分组操作（例如，每分钟）：

na_values=['-']

将原始日期格式转换为熊猫日期对象

1 个答案: