所以我的数据集输出为:
gdp = pd.read_csv(r"gdpproject.csv",
encoding="ISO-8859-1")
gdp.head(2)
gdp.tail(2)
这给了我输出:
Country.Name Indicator.Name 2004 2005
0 World GDP 5.590000e+13 5.810000e+13
1 World Health 5.590000e+13 5.810000e+13
086 Zimbabwe GDP per capita 8.681564e+02 8.082944e+02
089 Zimbabwe Population 1.277751e+07 1.294003e+07
因此,您立即注意到每个国家/地区都有多个指标。
我要做的是从两个当前指标创建一个新指标。并为每个独特的国家创建它。
for i in series(gdp['Country.Name']):
gdp['Military Spending'] = 100 / gdp['Military percent of GDP'] *
gdp['GDP']
它给了我这个错误信息:
NameError Traceback (most recent call last)
<ipython-input-37-d817ea1522fc> in <module>()
----> 1 for i in series(gdp1['Country.Name']):
2 gdp1['Military Spending'] = 100 / gdp1['Military percent of GDP'] *
gdp1['GDP']
NameError: name 'series' is not defined
如何让这个系列工作?我也尝试过简单
for i in gdp['Country.Name']
但仍然收到错误消息。
请帮忙!
答案 0 :(得分:0)
假设您有以下输入Dataframe
(请注意,在您的示例中数据Military percent of GDP
不存在):
>>> gdp
Country.Name Indicator.Name 2004 2005
0 World GDP 5.590000e+13 5.810000e+13
1 World Military percent of GDP 2.100000e+00 2.300000e+00
2 Zimbabwe GDP 1.628900e+10 1.700000e+10
3 Zimbabwe Military percent of GDP 2.000000e+00 2.100000e+00
然后,您可以分别使用df_gdp
和df_mpgdp
的数据为2004
和2005
创建辅助数据框GDP
和Military percent of GDP
。然后,您可以创建df_msp
,其中包含名为Indicator.Name
的新Military Spending
,最后将其结果附加到原始Dataframe
。请注意,在某些情况下我们需要reset_index
以确保使用预期索引完成计算。
下面的代码应该适用于您的目标:
import pandas as pd
gdp = pd.DataFrame( [
["World", "GDP", 5.590000e+13, 5.810000e+13],
["World", "Military percent of GDP", 2.1, 2.3],
["Zimbabwe", "GDP", 16289e6, 17000e6],
["Zimbabwe", "Military percent of GDP", 2, 2.1]])
gdp.columns = ["Country.Name", "Indicator.Name", "2004", "2005"]
df_gdp = gdp[gdp["Indicator.Name"] == "GDP"]
df_mpgdp = gdp[gdp["Indicator.Name"] == "Military percent of GDP"]
df_msp = pd.DataFrame()
df_msp["Country.Name"] = df_gdp["Country.Name"].reset_index(drop=True)
df_msp["Indicator.Name"] = "Military Spending"
df_msp["2004"] = 100 / df_mpgdp[["2004"]].reset_index(drop=True) * df_gdp[["2004"]].reset_index(drop=True)
df_msp["2005"] = 100 / df_mpgdp[["2005"]].reset_index(drop=True) * df_gdp[["2005"]].reset_index(drop=True)
gdp_out = gdp.append(df_msp)
gdp_out = gdp_out.sort_values(["Country.Name", "Indicator.Name"])
gdp_out = gdp_out.reset_index(drop=True)
最后输出Dataframe
会导致:
>>> gdp_out
Country.Name Indicator.Name 2004 2005
0 World GDP 5.590000e+13 5.810000e+13
1 World Military Spending 2.661905e+15 2.526087e+15
2 World Military percent of GDP 2.100000e+00 2.300000e+00
3 Zimbabwe GDP 1.628900e+10 1.700000e+10
4 Zimbabwe Military Spending 8.144500e+11 8.095238e+11
5 Zimbabwe Military percent of GDP 2.000000e+00 2.100000e+00