大熊猫错误的约会时间

时间:2017-10-13 12:49:59

标签: python-3.x pandas

数据删除

enter image description here

所以我从一个网站上抓取了数据,其中包含时间戳。正如你所看到的,我在2017-09-14 13:56:28和2017-09-16 14:43:05之间没有约会,但是当我使用以下代码进行刮擦时:

path ='law_scraped'
files = glob.glob(path + "/*.csv")

frame = pd.DataFrame()

for f in files:
    df = pd.read_csv(f)

    df['dtScraped'] = df['dtScraped'].str.replace("|", " ")

    try:
        df['dtScraped'] = pd.to_datetime(df['dtScraped'], format = "%Y/%m/%d %H:%M:%S")
    except Exception as e:
        df['dtScraped'] = pd.to_datetime(df['dtScraped'])

    frame = pd.concat([frame, df], ignore_index=True)

我得到的日期时间与数据不符,如下所示:

+-----------+---------------------+-------+-------------------+
|           |        dtScraped    | odds  |  team             |
+-----------+---------------------+-------+-------------------+
|     15117 | 2017-09-14 14:00:00 | 7.75  | Feyenoord         |
|     15118 | 2017-09-14 14:00:00 | 1.446 | Manchester City   |
|     15119 | 2017-09-14 14:00:00 | 5.01  | Draw              |
|     15120 | 2017-09-14 14:00:00 | 4.73  | NK Maribor        |
|     15121 | 2017-09-14 14:00:00 | 1.869 | Spartak Moscow    |
|     15122 | 2017-09-14 14:00:00 | 3.65  | Draw              |
|     15123 | 2017-09-14 14:00:00 | 1.694 | Liverpool         |
|     15124 | 2017-09-14 14:00:00 | 5.16  | Sevilla           |
|     15125 | 2017-09-14 14:00:00 | 4.25  | Draw              |
|     15126 | 2017-09-14 14:00:00 | 3.53  | Shakhtar Donetsk  |
|     15127 | 2017-09-14 14:00:00 | 2.19  | Napoli            |
|     15128 | 2017-09-14 14:00:00 | 3.58  | Draw              |
|     15129 | 2017-09-14 14:00:00 | 2.15  | RB Leipzig        |
|     15130 | 2017-09-14 14:00:00 | 3.5   | AS Monaco         |
|     15131 | 2017-09-14 14:00:00 | 3.73  | Draw              |
|     15132 | 2017-09-14 14:00:00 | 1.044 | Real Madrid       |
|     15133 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia     |
|     15134 | 2017-09-14 14:00:00 | 23.04 | Draw              |
|     15135 | 2017-09-14 14:00:00 | 2.33  | Tottenham Hotspur |
|     15136 | 2017-09-14 14:00:00 | 3.12  | Borussia Dortmund |
|     15137 | 2017-09-14 14:00:00 | 3.69  | Draw              |
|     15138 | 2017-09-14 14:00:00 | 1.52  | FC Porto          |
|     15139 | 2017-09-14 14:00:00 | 7.63  | Besiktas JK       |
|     15140 | 2017-09-14 14:00:00 | 4.32  | Draw              |
|    144009 | 2017-09-14 14:00:00 | 7.75  | Feyenoord         |
|    144010 | 2017-09-14 14:00:00 | 1.446 | Manchester City   |
|    144011 | 2017-09-14 14:00:00 | 5.01  | Draw              |
|    144012 | 2017-09-14 14:00:00 | 4.609 | NK Maribor        |
|    144013 | 2017-09-14 14:00:00 | 1.892 | Spartak Moscow    |
|    144014 | 2017-09-14 14:00:00 | 3.64  | Draw              |
|    144015 | 2017-09-14 14:00:00 | 1.694 | Liverpool         |
|    144016 | 2017-09-14 14:00:00 | 5.16  | Sevilla           |
|    144017 | 2017-09-14 14:00:00 | 4.25  | Draw              |
|    144018 | 2017-09-14 14:00:00 | 3.53  | Shakhtar Donetsk  |
|    144019 | 2017-09-14 14:00:00 | 2.19  | Napoli            |
|    144020 | 2017-09-14 14:00:00 | 3.58  | Draw              |
|    144021 | 2017-09-14 14:00:00 | 2.15  | RB Leipzig        |
|    144022 | 2017-09-14 14:00:00 | 3.5   | AS Monaco         |
|    144023 | 2017-09-14 14:00:00 | 3.73  | Draw              |
|    144024 | 2017-09-14 14:00:00 | 1.044 | Real Madrid       |
|    144025 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia     |
|    144026 | 2017-09-14 14:00:00 | 23.04 | Draw              |
|    144027 | 2017-09-14 14:00:00 | 2.33  | Tottenham Hotspur |
|    144028 | 2017-09-14 14:00:00 | 3.12  | Borussia Dortmund |
|    144029 | 2017-09-14 14:00:00 | 3.69  | Draw              |
|    144030 | 2017-09-14 14:00:00 | 1.52  | FC Porto          |
|    144031 | 2017-09-14 14:00:00 | 7.63  | Besiktas JK       |
|    144032 | 2017-09-14 14:00:00 | 4.32  | Draw              |
+-----------+---------------------+-------+-------------------+

1 个答案:

答案 0 :(得分:0)

假设您的时间戳与屏幕截图中的文件名格式相同,这应该有效(在"|"替换" "后):

df['dtScraped'] = pd.to_datetime(df['dtScraped'], format="%Y-%m-%d %H-%M-%S")