我以这种方式创建了一个字典:
数据如下:
GDS3:
ABC_1 ABC_2 BBB_1
cat elf 123
dog run 456
bird burp 789
GDS4:
ABC_3 ABC_4 BCB_a
beer yes 234
wine no 543
gin yes 743
GDS5:
ABC_5 ABC_6 BCD_c
lol yea 543
lmao NaN 446
asl NaN 777
#create a dictionary in which all columns that start with the same 3 characters will be grouped in the same key.
dict_2013 = {k: g for k, g in GDS3.groupby(by=lambda x: x[:3].lower(), axis=1)}
dict_2014 = {k: g for k, g in GDS4.groupby(by=lambda x: x[:3].lower(), axis=1)}
dict_2015 = {k: g for k, g in GDS5.groupby(by=lambda x: x[:3].lower(), axis=1)}
#start with year 2013:
global_dict=dict_2013
#if key in the new dictionary is in the old dictionary then
#add the values from the new dictionary key to the old dictionary key
#else if the new dictionary key does not exist in the old dictionary then add a new key with the new values
for key,val in dict_2014.items():
if key in global_dict:
global_dict[key]=[global_dict[key],val]
else:
global_dict[key]=val
for key,val in dict_2015.items():#to add items
if key in global_dict:
global_dict[key]=[global_dict[key],val]
else:
global_dict[key]=val
这是我想要的输出(每个键的数据帧)
df_ABC:
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5
cat elf beer yes lol
dog run win no lmao
bird burp gin yes asl
df_BBB:
BBB_1
cat
dog
bird
换句话说,我想将单个键转换为单个词典(对于所有键),因此我尝试了以下操作:
ABC_dataframe=pd.DataFrame(global_dict['ABC'])
执行此操作时,出现以下错误:
TypeError: Expected list, got DataFrame
这很奇怪,因为global_dict ['ABC']是一个列表。 (我使用type(global_dict ['ABC']检查)。
该如何解决?我尝试拼合列表,但仍然遇到问题。
答案 0 :(得分:2)
您的逻辑中最令人困惑的部分是拥有global_dict
值(数据帧或列表)。保持对象类型一致;选择列表,并在每次添加值时附加到列表中。
Pythonic解决方案是使用collections.defaultdict
个对象中的一个list
:
from collections import defaultdict
global_dict = defaultdict(list, {k: [v] for k, v in dict_2013.items()})
for key,val in dict_2014.items():
global_dict[key].append(val)
for key,val in dict_2015.items():
global_dict[key].append(val)
然后将pd.concat
与axis=1
一起使用:
abc = pd.concat(global_dict['abc'], axis=1)
print(abc)
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0 cat elf beer yes lol yea
1 dog run wine no lmao NaN
2 bird burp gin yes asl NaN
我无法解释为什么您缺少期望的结果ABC_6
。
答案 1 :(得分:2)
如果GDS3,GDS4和GSD5已经是数据帧,则可以使用pd.concat
和groupby
进行操作:
tdf = pd.concat([GDS3, GDS4, GDS5], axis=1)
g = tdf.groupby(tdf.columns.str[:3], axis=1)
# Now, let's create a dictionary of dataframes grouped
# by the first three letters of each column.
df_list = {}
for n, i in g:
df_list[n] = i
print(df_list['ABC'])
print(df_list['BBB'])
或者如@jpp建议的那样使用:
dict_dfs = dict(tuple(g))
print(dict_dfs['ABC'])
print(dict_dfs['BBB'])
输出:
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0 cat elf beer yes lol yea
1 dog run wine no lmao NaN
2 bird burp gin yes asl NaN
BBB_1
0 123
1 456
2 789