我有一个生成错误日期格式的python脚本。
import csv
import urllib
import requests
import numpy as np
from urllib.request import urlopen
from matplotlib.dates import DateFormatter
import matplotlib.pyplot as plt
import pandas as pd
import io
link = 'https://health-infobase.canada.ca/src/data/covidLive/covid19.csv'
s = requests.get(link).content
coviddata = pd.read_csv(io.StringIO(s.decode('utf-8')),
parse_dates=['date'],
index_col= ['date'],
na_values=['999.99'])
prinput = 'Quebec'
ispr = coviddata['prname'] == prinput
covidpr = coviddata[ispr]
print(covidpr)
它产生的数据似乎使日期混乱,如下所示。
pruid prname prnameFR ... numtotal numtoday numtested
日期... 2020-01-03 24魁北克魁北克省... 1 1 NaN 2020-03-03 24魁北克魁北克省... 1 0 NaN 2020-05-03 24魁北克魁北克省... 2 1 NaN 2020-06-03 24魁北克魁北克省... 2 0 NaN 2020-07-03 24魁北克魁北克省... 2 0 NaN 2020-08-03 24魁北克魁北克省... 3 1 NaN 2020-09-03 24魁北克魁北克省... 4 1 NaN 2020-11-03 24魁北克魁北克省... 7 3 NaN 2020-12-03 24魁北克魁北克省... 13 6 NaN 2020-03-13 24魁北克魁北克省... 17 4 NaN 2020-03-14 24魁北克魁北克省... 17 0 NaN
现在相反 这是另一个有效的代码段。
import csv
import urllib
import requests
from urllib.request import urlopen
from matplotlib.dates import DateFormatter
import matplotlib.pyplot as plt
from datetime import datetime
link = 'https://health-infobase.canada.ca/src/data/covidLive/covid19.csv'
text = requests.get(link).text
lines = text.splitlines()
infile = csv.DictReader(lines)
prinput = input("Enter province(EN):")
xvalues=[]
yvalues=[]
for row in infile:
if(row['prname']==prinput):
xvalues.append(row['date'])
yvalues.append(row['numconf'])
print(row['prname'],row['date'],row['numconf'])
它产生正确的日期 魁北克01-03-2020 1 魁北克03-03-2020 1 魁北克05-03-2020 2 魁北克06-03-2020 2 魁北克07-03-2020 2 魁北克08-03-2020 3 魁北克09-03-2020 4 魁北克11-03-2020 7 魁北克12-03-2020 13 魁北克13-03-2020 17 魁北克14-03-2020 17 魁北克15-03-2020 24 魁北克16-03-2020 39 魁北克17-03-2020 50
第一个脚本有什么问题?
答案 0 :(得分:0)
由于使用了parse_dates
属性,因此pandas将“ date”列解释为日期时间对象。这对于在一段时间内绘制数据或在给定时间段内对数据重新采样非常有用。如果要重组日期时间格式以打印数据集,可以使用日期时间序列的dt.strftime
属性来进行。
即
# Import pandas
import pandas as pd
# Read in dataframe from url
covid_df = pd.read_csv("https://health-infobase.canada.ca/src/data/covidLive/covid19.csv",
parse_dates=['date'], na_values=[999.99])
# Create new column date-str that's the string interpretation of the 'date' column
covid_df['date-str'] = covid_df['date'].dt.strftime("%d-%m-%Y")
# Show the top of the dataframe
covid_df.head()
"""
pruid prname prnameFR date ... numtotal numtoday numtested date-str
0 35 Ontario Ontario 2020-01-31 ... 3 3 NaN 31-01-2020
1 59 British Columbia Colombie-Britannique 2020-01-31 ... 1 1 NaN 31-01-2020
2 1 Canada Canada 2020-01-31 ... 4 4 NaN 31-01-2020
3 35 Ontario Ontario 2020-08-02 ... 3 0 NaN 02-08-2020
4 59 British Columbia Colombie-Britannique 2020-08-02 ... 4 3 NaN 02-08-2020
"""
# Show dtypes and properties of each column of the dataframe
covid_df.info()
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 302 entries, 0 to 301
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 pruid 302 non-null int64
1 prname 302 non-null object
2 prnameFR 302 non-null object
3 date 302 non-null datetime64[ns]
4 numconf 302 non-null int64
5 numprob 302 non-null int64
6 numdeaths 302 non-null int64
7 numtotal 302 non-null int64
8 numtoday 302 non-null int64
9 numtested 0 non-null float64
10 date-str 302 non-null object
dtypes: datetime64[ns](1), float64(1), int64(6), object(3)
memory usage: 26.1+ KB
"""