如何从pandas中的列表向Dataframe添加行?

时间:2017-02-07 22:40:39

标签: python pandas dataframe

我有一个存储在DataFrame中的国家/地区的年度信息(COUNT)。 但是,有些国家在某些年份失踪。

如果我有完整的国家/地区列表,那么在相应年份下添加它们的最佳方式是什么,并将COUNT的缺失值填入0?

            DATE    COUNTRY     COUNTRY_ID  COUNT
       0    1980    United States   840     42
      42    1980    Czech Republic  203     2
      95    1980    Hungary         348     1
      96    1980    Great Britain   826     1
      97    1980    South Africa    710     1
      98    1982    United States   840     42
     140    1982    Paraguay        600     2
       .
       .

2 个答案:

答案 0 :(得分:1)

执行此操作的一种方法是组合所有DATE,COUNTRY组合,然后reindex DataFrame,最后填写缺失值。

# Assume that we want all years not just the ones seen
years = range(df['DATE'].min(), df['DATE'].max()+1)

# get all combinations
idx = pd.MultiIndex.from_product([years, df['COUNTRY'].unique()], names=['DATE', 'COUNTRY'])

# reindex by first putting DATE and COUNTRY into the index
df1 = df.set_index(['DATE', 'COUNTRY']).reindex(idx).reset_index()

# Fill back in missing IDs
country_map = df.set_index('COUNTRY')['COUNTRY_ID'].drop_duplicates()
df1['COUNTRY_ID'] = df1.COUNTRY.map(country_map)

# fill in 0 for COUNT and convert back to int
df1['COUNT'] = df1['COUNT'].fillna(0).astype(int)

    DATE         COUNTRY  COUNTRY_ID  COUNT
0   1980   United States         840     42
1   1980  Czech Republic         203      2
2   1980         Hungary         348      1
3   1980   Great Britain         826      1
4   1980    South Africa         710      1
5   1980        Paraguay         600      0
6   1981   United States         840      0
7   1981  Czech Republic         203      0
8   1981         Hungary         348      0
9   1981   Great Britain         826      0
10  1981    South Africa         710      0
11  1981        Paraguay         600      0
12  1982   United States         840     42
13  1982  Czech Republic         203      0
14  1982         Hungary         348      0
15  1982   Great Britain         826      0
16  1982    South Africa         710      0
17  1982        Paraguay         600      2

答案 1 :(得分:0)

还考虑一个交叉连接merge路由(对于我们这些具有SQL思维模式的人)

# ASSIGN KEY COLUMN
df['KEY'] = 1

# CREATE DF OF DATES RANGE
dates = pd.DataFrame({'DATE':list(range(df['DATE'].min(),df['DATE'].max() + 1)),
                      'COUNT':0, 'KEY':1})    
# CROSS JOIN MERGE
mdf = df.merge(dates, on=['KEY'])

# REASSIGN COUNT
mdf.loc[mdf['DATE_x'] != mdf['DATE_y'], 'COUNT_x'] = 0

# CLEAN UP DF (COLS AND ROWS)
mdf = mdf[['DATE_y', 'COUNTRY', 'COUNTRY_ID', 'COUNT_x']]\
           .rename(columns={'DATE_y':'DATE', 'COUNT_x':'COUNT'})\
           .drop_duplicates(['DATE', 'COUNTRY', 'COUNTRY_ID'])\
           .sort_values('DATE')\
           .reset_index(drop=True)

#     DATE         COUNTRY  COUNTRY_ID  COUNT
# 0   1980   United States         840     42
# 1   1980        Paraguay         600      0
# 2   1980  Czech Republic         203      2
# 3   1980         Hungary         348      1
# 4   1980   Great Britain         826      1
# 5   1980    South Africa         710      1
# 6   1981   United States         840      0
# 7   1981  Czech Republic         203      0
# 8   1981         Hungary         348      0
# 9   1981        Paraguay         600      0
# 10  1981   Great Britain         826      0
# 11  1981    South Africa         710      0
# 12  1982    South Africa         710      0
# 13  1982         Hungary         348      0
# 14  1982  Czech Republic         203      0
# 15  1982   United States         840      0
# 16  1982   Great Britain         826      0
# 17  1982        Paraguay         600      2