数据删除
所以我从一个网站上抓取了数据,其中包含时间戳。正如你所看到的,我在2017-09-14 13:56:28和2017-09-16 14:43:05之间没有约会,但是当我使用以下代码进行刮擦时:
path ='law_scraped'
files = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
for f in files:
df = pd.read_csv(f)
df['dtScraped'] = df['dtScraped'].str.replace("|", " ")
try:
df['dtScraped'] = pd.to_datetime(df['dtScraped'], format = "%Y/%m/%d %H:%M:%S")
except Exception as e:
df['dtScraped'] = pd.to_datetime(df['dtScraped'])
frame = pd.concat([frame, df], ignore_index=True)
我得到的日期时间与数据不符,如下所示:
+-----------+---------------------+-------+-------------------+
| | dtScraped | odds | team |
+-----------+---------------------+-------+-------------------+
| 15117 | 2017-09-14 14:00:00 | 7.75 | Feyenoord |
| 15118 | 2017-09-14 14:00:00 | 1.446 | Manchester City |
| 15119 | 2017-09-14 14:00:00 | 5.01 | Draw |
| 15120 | 2017-09-14 14:00:00 | 4.73 | NK Maribor |
| 15121 | 2017-09-14 14:00:00 | 1.869 | Spartak Moscow |
| 15122 | 2017-09-14 14:00:00 | 3.65 | Draw |
| 15123 | 2017-09-14 14:00:00 | 1.694 | Liverpool |
| 15124 | 2017-09-14 14:00:00 | 5.16 | Sevilla |
| 15125 | 2017-09-14 14:00:00 | 4.25 | Draw |
| 15126 | 2017-09-14 14:00:00 | 3.53 | Shakhtar Donetsk |
| 15127 | 2017-09-14 14:00:00 | 2.19 | Napoli |
| 15128 | 2017-09-14 14:00:00 | 3.58 | Draw |
| 15129 | 2017-09-14 14:00:00 | 2.15 | RB Leipzig |
| 15130 | 2017-09-14 14:00:00 | 3.5 | AS Monaco |
| 15131 | 2017-09-14 14:00:00 | 3.73 | Draw |
| 15132 | 2017-09-14 14:00:00 | 1.044 | Real Madrid |
| 15133 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia |
| 15134 | 2017-09-14 14:00:00 | 23.04 | Draw |
| 15135 | 2017-09-14 14:00:00 | 2.33 | Tottenham Hotspur |
| 15136 | 2017-09-14 14:00:00 | 3.12 | Borussia Dortmund |
| 15137 | 2017-09-14 14:00:00 | 3.69 | Draw |
| 15138 | 2017-09-14 14:00:00 | 1.52 | FC Porto |
| 15139 | 2017-09-14 14:00:00 | 7.63 | Besiktas JK |
| 15140 | 2017-09-14 14:00:00 | 4.32 | Draw |
| 144009 | 2017-09-14 14:00:00 | 7.75 | Feyenoord |
| 144010 | 2017-09-14 14:00:00 | 1.446 | Manchester City |
| 144011 | 2017-09-14 14:00:00 | 5.01 | Draw |
| 144012 | 2017-09-14 14:00:00 | 4.609 | NK Maribor |
| 144013 | 2017-09-14 14:00:00 | 1.892 | Spartak Moscow |
| 144014 | 2017-09-14 14:00:00 | 3.64 | Draw |
| 144015 | 2017-09-14 14:00:00 | 1.694 | Liverpool |
| 144016 | 2017-09-14 14:00:00 | 5.16 | Sevilla |
| 144017 | 2017-09-14 14:00:00 | 4.25 | Draw |
| 144018 | 2017-09-14 14:00:00 | 3.53 | Shakhtar Donetsk |
| 144019 | 2017-09-14 14:00:00 | 2.19 | Napoli |
| 144020 | 2017-09-14 14:00:00 | 3.58 | Draw |
| 144021 | 2017-09-14 14:00:00 | 2.15 | RB Leipzig |
| 144022 | 2017-09-14 14:00:00 | 3.5 | AS Monaco |
| 144023 | 2017-09-14 14:00:00 | 3.73 | Draw |
| 144024 | 2017-09-14 14:00:00 | 1.044 | Real Madrid |
| 144025 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia |
| 144026 | 2017-09-14 14:00:00 | 23.04 | Draw |
| 144027 | 2017-09-14 14:00:00 | 2.33 | Tottenham Hotspur |
| 144028 | 2017-09-14 14:00:00 | 3.12 | Borussia Dortmund |
| 144029 | 2017-09-14 14:00:00 | 3.69 | Draw |
| 144030 | 2017-09-14 14:00:00 | 1.52 | FC Porto |
| 144031 | 2017-09-14 14:00:00 | 7.63 | Besiktas JK |
| 144032 | 2017-09-14 14:00:00 | 4.32 | Draw |
+-----------+---------------------+-------+-------------------+
答案 0 :(得分:0)
假设您的时间戳与屏幕截图中的文件名格式相同,这应该有效(在"|"
替换" "
后):
df['dtScraped'] = pd.to_datetime(df['dtScraped'], format="%Y-%m-%d %H-%M-%S")