Question

我有70000多个数据点的数据集（见图）

如您所见，在“日期”列中，格式的另一半（更混乱）与另一半（更清晰）相比有所不同。如何将整个格式作为数据框的后半部分？

我知道如何手动执行操作，但是要花很多时间！

谢谢！

编辑

lazy var timeFormatter: DateFormatter = {
    let formatter = DateFormatter()
    formatter.dateStyle = .medium
    formatter.timeStyle = .medium
    return formatter
}()

日期格式不正确

[ STRANGE DATE

编辑2

两种数据格式：

2012-01-01 00:00:00
2020-07-21T22：45：00 + 00：00

Answer 1

我已经尝试了以下方法，但仍有效，请注意，此方法假设两个关键假设：

1-您的示例中的开始日期遵循两种格式中的一种并且只有一种！

2-最终输出是字符串！

如果是这样，这应该可以解决问题，否则，这是一个起点，可以更改为您想要的样子：

import pandas as pd
import datetime

#data sample
d = {'date':['20090602123000', '20090602124500', '2020-07-22 18:45:00+00:00', '2020-07-22 19:00:00+00:00']}

#create dataframe
df = pd.DataFrame(data = d)
print(df)

                        date
0             20090602123000
1             20090602124500
2  2020-07-22 18:45:00+00:00
3  2020-07-22 19:00:00+00:00

#loop over records
for i, row in df.iterrows():
    #get date
    dateString = df.at[i,'date']
    #check if it's the undesired format or the desired format
    #NOTE i'm using the '+' substring to identify that, this comes to my first assumption above that you only have two formats and that should work
    if '+' not in dateString:
        #reformat datetime
        #NOTE: this is comes to my second assumption where i'm producing it into a string format to add the '+00:00'
        df['date'].loc[df.index == i] = str(datetime.datetime.strptime(dateString, '%Y%m%d%H%M%S')) + '+00:00'
    else:
        continue

print(df)

                        date
0  2009-06-02 12:30:00+00:00
1  2009-06-02 12:45:00+00:00
2  2020-07-22 18:45:00+00:00
3  2020-07-22 19:00:00+00:00

Answer 2

您可以格式化数据框的第一部分

import datetime as dt
df['date'] = df['date'].apply(lambda x: dt.datetime.fromtimestamp(int(str(x)) / 1000).strftime('%Y-%m-%d %H:%M:%S') if str(x).isdigit() else x)

这将检查值的所有字符是否都是数字，然后将日期格式化为第二部分

编辑

时间戳似乎以毫秒为单位，而时间戳应该以秒为单位=> / 1000

在熊猫数据框中修改行值的格式

2 个答案: