如何修复pandas数据结构错误:“ Keyerror”

时间:2019-04-16 18:47:23

标签: python

我的程序从csv文件中取出数据并预测数据的“幸福感”,直到到达我的代码中的某个部分,它都可以正常工作。经过大量调试后,我意识到这是一行。

我已经尝试过切换一些用于从csv和字典中获取数据的关键字参数。

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# To support both python 2 and python 3
# Common imports
import numpy as np
import numpy.random as rnd
import os

# to make this notebook's output stable across runs
rnd.seed(42)

# To plot pretty figures
# %matplotlib inline
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "C:\\Users\gunja\Desktop\Proggraming\Basic Python\Machine_Learning_Tensorflow"
CHAPTER_ID = "fundamentals"


def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "\images",
                        CHAPTER_ID, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

oecd_bli = pd.read_csv("data_csv.csv", thousands=',')
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"] == "TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
print(oecd_bli.head(2))
print(oecd_bli["Life satisfaction"].head())

gdp_per_capita = pd.read_csv("WEO_Data_act.xls", thousands=',', delimiter='\t',
                             encoding='latin1', na_values="n/a")
gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
print(gdp_per_capita.head(2))

full_country_stats = pd.merge(
    left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
print(full_country_stats)
print(full_country_stats[["GDP per capita", 'Life satisfaction']].loc["United States"])

remove_indices = [0, 1, 6, 8, 33, 34, 35]
keep_indices = list(set(range(36)) - set(remove_indices))

sample_data = full_country_stats[[
    "GDP per capita", 'Life satisfaction']].iloc[keep_indices]
missing_data = full_country_stats[[
    "GDP per capita", 'Life satisfaction']].iloc[remove_indices]

sample_data.plot(kind='scatter', x="GDP per capita",
                 y='Life satisfaction', figsize=(5, 3))
plt.axis([0, 60000, 0, 10])
position_text = {
    "Hungary": (5000, 1),
    "Korea": (18000, 1.7),
    "France": (29000, 2.4),
    "Australia": (40000, 3.0),
    "United States": (52000, 3.8),
}
for country, pos_text in position_text.items():
    print("Good 1")
    pos_data_x, pos_data_y = sample_data.loc[country]
    print("Good 2")
    country = "United States" if country == "United States" else print(country)
    print("Good 3")
    plt.annotate(country, xy=(pos_data_x, pos_data_y), xytext=pos_text,
                 arrowprops=dict(facecolor='black', width=0.5, shrink=0.1, headwidth=5))
    print("good 4")
    plt.plot(pos_data_x, pos_data_y, "ro")
    print("Good 5")
save_fig('money_happy_scatterplot')
plt.show()

这是错误的主线。当它穿越澳大利亚时,这出了点问题。

pos_data_x, pos_data_y = sample_data.loc[country]

我的程序应该绘制所有结果的图形,但是它不断使我感到“关键字错误:美国”。这可能意味着美国的某些代码实际上有问题,但是目前我还不知道。谢谢您的回复。

0 个答案:

没有答案