比较熊猫数据框上的值后更新行

时间:2020-07-16 13:03:13

标签: python pandas

我连接到一个API,该API在巴西按州和城市组织了covid-19数据,如下所示:

#Bibliotecas
import pandas as pd
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot_date, axis, show, gcf
import numpy as np
from urllib.request import Request, urlopen
import urllib
from http.cookiejar import CookieJar
import numpy as np
from datetime import datetime, timedelta

cj = CookieJar()

url_Bso = "https://brasil.io/api/dataset/covid19/caso_full/data?state=MG&city=Barroso"
req_Bso = urllib.request.Request(url_Bso, None, {"User-Agent": "python-urllib"})
opener_Bso = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
response_Bso = opener_Bso.open(req_Bso)
raw_response_Bso = response_Bso.read()

json_Bso = pd.read_json(raw_response_Bso)
results_Bso = json_Bso['results']
results_Bso = results_Bso.to_dict().values()
df_Bso = pd.DataFrame(results_Bso)
df_Bso.head(5)

此Api汇编了州卫生部门发布的数据。但是,州和城市卫生部门的记录之间存在差异,并且州记录相对于城市的记录已经过时了。我想更新星期四和星期六(流行病学周结束的那一天)。我正在尝试以下方法:

saturday = datetime.today() + timedelta(days=-5)
yesterday = datetime.today() + timedelta(days=-1)
last_available_confirmed_day_Bso_saturday = 51
last_available_confirmed_day_Bso_yesterday = 54
df_Bso = df_Bso.loc[df_Bso['date'] == saturday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_saturday
df_Bso = df_Bso.loc[df_Bso['date'] == yesterday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_yesterday
df_Bso

但是,我得到了错误:

> AttributeError: 'int' object has no attribute 'loc'

我需要另一个具有这些天更新值的数据框。有人可以帮忙吗?

1 个答案:

答案 0 :(得分:2)

您必须调整日期。您的数据框日期列是一个字符串。您可以将它们转换为日期时间。

today = datetime.now()

last_sat_num = (today.weekday() + 2) % 7
last_thu_num = (today.weekday() + 4) % 7

last_sat = today - timedelta(last_sat_num)
last_thu = today - timedelta(last_thu_num)
last_sat_str = last_sat.strftime('%Y-%m-%d')
last_thu_str = last_thu.strftime('%Y-%m-%d')

last_available_confirmed_day_Bso_sat = 51
last_available_confirmed_day_Bso_thu = 54

df_Bso2 = df_Bso.copy()
df_Bso2.loc[df_Bso2['date'] == last_sat_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_sat
df_Bso2.loc[df_Bso2['date'] == last_thu_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_thu

df_Bso2[['date', 'last_available_confirmed']].head(10)

输出

         date  last_available_confirmed
0  2020-07-15                        44
1  2020-07-14                        43
2  2020-07-13                        40
3  2020-07-12                        40
4  2020-07-11                        51
5  2020-07-10                        39
6  2020-07-09                        36
7  2020-07-08                        36
8  2020-07-07                        27
9  2020-07-06                        27