我有一个很大的数据集,其中列的索引具有日期格式。为了解释我的问题,我正在建立一个类似的数据集,如下所示:
将熊猫作为pd导入
Cities = ['San Francisco', 'Los Angeles', 'New York', 'Huston', 'Chicago']
Jan = [10, 20, 15, 10, 35]
Feb = [12, 23, 17, 15, 41]
Mar = [15, 29, 21, 21, 53]
Apr = [27, 48, 56, 49, 73]
data = pd.DataFrame({'City': Cities, '01/01/20': Jan, '02/01/20': Feb, '03/01/20': Mar, '04/01/20': Apr})
print (data)
City 01/01/20 02/01/20 03/01/20 04/01/20
0 San Francisco 10 12 15 27
1 Los Angeles 20 23 29 48
2 New York 15 17 21 56
3 Huston 10 15 21 49
4 Chicago 35 41 53 73
我想绘制每个城市随时间变化的数据。这是我的尝试:
import matplotlib.pyplot as plt
cols = data.columns
dates = data.loc[:, cols[1:]].columns
San_Francisco = []
Los_Angeles = []
New_York = []
Huston = []
Chicago = []
for i in dates:
San_Francisco.append(data[data['City'] == 'San Francisco'][i].sum())
Los_Angeles.append(data[data['City'] == 'Los Angeles'][i].sum())
New_York.append(data[data['City'] == 'New York'][i].sum())
Huston.append(data[data['City'] == 'Huston'][i].sum())
Chicago.append(data[data['City'] == 'Chicago'][i].sum())
plt.plot(dates, San_Francisco, label='San Francisco')
plt.plot(dates, Los_Angeles, label='Los Angeles')
plt.plot(dates, New_York, label='New York')
plt.plot(dates, Huston, label='Huston')
plt.plot(dates, Chicago, label='Chicago')
plt.legend()
结果是我想要的,但是,对于大型数据集,我的方法效率不高。我如何加快速度?同样在绘图部分,我有一排排大城市,手动对名称进行硬编码很繁琐。有更好的方法吗?
谢谢
答案 0 :(得分:5)
如果可能,-- if I press the right mouse button and left mouse button at the same time
-- then it would automatically press the key q
function OnEvent(event, arg)
if event == "PROFILE_ACTIVATED" then
EnablePrimaryMouseButtonEvents(true)
elseif event == "MOUSE_BUTTON_PRESSED" and arg < 3 then
repeat
Sleep(10)
if IsMouseButtonPressed(1) and IsMouseButtonPressed(3) then
PressKey("q")
Sleep(10)
ReleaseKey("q")
else
break
end
until nil
end
end
的某些值会先由GroupBy.sum
复制,然后由DataFrame.T
转置,最后由DataFrame.plot
绘制:
City
如果列data.groupby('City').sum().T.plot()
始终具有唯一值,则可以使用DataFrame.set_index
:
City
编辑:
data.set_index("City").T.plot()