我有一个带有两个键的词典列表。第一个键是共享索引,第二个键是列名。我想将此列表转换为Pandas DataFrame
对象。但是,当我这样做时,我得到重复的索引行,每行中有一列是空白的。
使用此代码:
import pandas as pd
l = [{'col_a': 0, 'idx': 0},
{'col_b': 5, 'idx': 0},
{'col_a': 1, 'idx': 1},
{'col_b': 6, 'idx': 1},
{'col_a': 2, 'idx': 2},
{'col_b': 7, 'idx': 2},
{'col_a': 3, 'idx': 3},
{'col_b': 8, 'idx': 3},
{'col_a': 4, 'idx': 4},
{'col_b': 9, 'idx': 4}]
df = pd.DataFrame(l)
df = df.set_index('idx')
我明白了
col_a col_b
idx
0 0.0 NaN
0 NaN 5.0
1 1.0 NaN
1 NaN 6.0
2 2.0 NaN
2 NaN 7.0
3 3.0 NaN
3 NaN 8.0
4 4.0 NaN
4 NaN 9.0
但我想要这个
col_a col_b
idx
0 0.0 5.0
1 1.0 6.0
2 2.0 7.0
3 3.0 8.0
4 4.0 9.0
有什么想法吗?谢谢!
答案 0 :(得分:5)
您可以在idx
上进行分组,然后选择.first()
:
In [10]: df
Out[10]:
col_a col_b idx
0 0.0 NaN 0
1 NaN 5.0 0
2 1.0 NaN 1
3 NaN 6.0 1
4 2.0 NaN 2
5 NaN 7.0 2
6 3.0 NaN 3
7 NaN 8.0 3
8 4.0 NaN 4
9 NaN 9.0 4
In [11]: df.groupby("idx").first()
Out[11]:
col_a col_b
idx
0 0.0 5.0
1 1.0 6.0
2 2.0 7.0
3 3.0 8.0
4 4.0 9.0
或致电pivot_table
:
In [36]: df.pivot_table(index="idx")
Out[36]:
col_a col_b
idx
0 0.0 5.0
1 1.0 6.0
2 2.0 7.0
3 3.0 8.0
4 4.0 9.0
答案 1 :(得分:1)
只需取sum
级别为0即
df.sum(level=0)
col_a col_b
idx
0 0.0 5.0
1 1.0 6.0
2 2.0 7.0
3 3.0 8.0
4 4.0 9.0
答案 2 :(得分:0)
DSM的答案适用于您的示例,但如果一个索引可能有多个col_a值,则可能会导致数据丢失。这个更长的代码可用于协调这一点。
import pandas
l = [{'col_a': 0, 'idx': 0},
{'col_b': 5, 'idx': 0},
{'col_a': 1, 'idx': 1},
{'col_b': 6, 'idx': 1},
{'col_a': 2, 'idx': 2},
{'col_b': 7, 'idx': 2},
{'col_a': 3, 'idx': 3},
{'col_b': 8, 'idx': 3},
{'col_a': 4, 'idx': 4},
{'col_b': 9, 'idx': 4}]
# To flatten (unnest) a list with lists
flatten = lambda x: [item for sublist in x for item in sublist]
# Get all unique columns there (in case there are mote then two)
all_unique_cols = list(set(flatten([tuple(x.keys()) for x in l])))
all_unique_cols.remove('idx') # all except the index colname
df = pd.DataFrame()
# For all these columns we'll make a small df, and later join together
for i, col in enumerate(all_unique_cols):
if i == 0:
df = pd.DataFrame([x for x in l if col in x.keys()])
else:
df_tmp = pd.DataFrame([x for x in l if col in x.keys()])
df = pd.merge(df, df_tmp, how='outer')
df.set_index('idx')
答案 3 :(得分:-1)
如何分别初始化值和索引?
l = []
ix = []
for i in range(5):
l.append({'col_a':i, 'col_b':i+5})
ix.append(i)
df = pd.DataFrame(l, index=ix)
OUT
col_a col_b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9