从字典中的密钥对在数据框中创建不同的列

时间:2016-03-09 11:12:48

标签: python dictionary pandas dataframe

这是我创建的词典的一部分:

defaultdict (int,
         {"['por', 'rus']": 80,
         "['nld', 'slv']": 4,
         "['jpn', 'pol']": 48,
         "['ces', 'epo']": 4,
         "['oci', 'ron']": 4,
         "['lit', 'mkd']": 2,
         "['deu', 'ewe']": 2,
         "['cat', 'ron']": 4,
         "['ces', 'ita']": 18,
         "['est', 'fra']": 14,
         "['hin', 'mal']": 4,

我希望有3列,column1:第一个键,column2:第二个键,column3:值。

当我创建数据帧时:

pairs_df = pd.DataFrame(list(pairs.iteritems()), columns = ['column1','column2'])
pairs_df.head()

输出:

           column1  column2
0   ['por', 'rus']       80
1   ['est', 'fra']       14
2   ['nld', 'slv']        4
3   ['jpn', 'pol']       48
4   ['ces', 'epo']        4
5   ['hin', 'mal']        4
6   ['oci', 'ron']        4
7   ['lit', 'mkd']        2
8   ['deu', 'ewe']        2
9   ['cat', 'ron']        4
10  ['ces', 'ita']       18

密钥进入一列,但我无法将它们分成树列。

3 个答案:

答案 0 :(得分:3)

这是你想要的吗?

import re

mydict=  {"['por', 'rus']": 80,
         "['nld', 'slv']": 4,
         "['jpn', 'pol']": 48,
         "['ces', 'epo']": 4,
         "['oci', 'ron']": 4,
         "['lit', 'mkd']": 2,
         "['deu', 'ewe']": 2,
         "['cat', 'ron']": 4,
         "['ces', 'ita']": 18,
         "['est', 'fra']": 14,
         "['hin', 'mal']": 4}


# this is where you seem to be stuck
for k,v in mydict.iteritems():
    print k,v    # keys are still strings, not lists

# this is the resolution, separation of the keys into two strings    
for k,v in mydict.iteritems():
    a=re.findall('\w{3}',k) 
    print a[0],a[1],v

输出:

['por', 'rus'] 80
['nld', 'slv'] 4
['jpn', 'pol'] 48
['ces', 'epo'] 4
['oci', 'ron'] 4
['lit', 'mkd'] 2
['deu', 'ewe'] 2
['cat', 'ron'] 4
['ces', 'ita'] 18
['est', 'fra'] 14
['hin', 'mal'] 4
por rus 80
nld slv 4
jpn pol 48
ces epo 4
oci ron 4
lit mkd 2
deu ewe 2
cat ron 4
ces ita 18
est fra 14
hin mal 4

现在,如果您愿意,可以将它们附加到列表中:

 x,y,z=[],[],[]
    for k,v in mydict.iteritems():
        a=re.findall('\w{3}',k) 
        x.append(a[0])
        y.append(a[1])
        z.append(v)
print x,y,z

或者如果你喜欢pandas Dataframe:

import pandas as pd
df = pd.DataFrame({'a': x, 'b': y,'c':z})
print df

输出:

['por', 'nld', 'jpn', 'ces', 'oci', 'lit', 'deu', 'cat', 'ces', 'est', 'hin'] ['rus', 'slv', 'pol', 'epo', 'ron', 'mkd', 'ewe', 'ron', 'ita', 'fra', 'mal'] [80, 4, 48, 4, 4, 2, 2, 4, 18, 14, 4]
      a    b   c
0   por  rus  80
1   nld  slv   4
2   jpn  pol  48
3   ces  epo   4
4   oci  ron   4
5   lit  mkd   2
6   deu  ewe   2
7   cat  ron   4
8   ces  ita  18
9   est  fra  14
10  hin  mal   4

答案 1 :(得分:2)

import pandas as pd
from collections import defaultdict
from ast import literal_eval

pairs = defaultdict (int,
            {"['por', 'rus']": 80,
             "['nld', 'slv']": 4,
             "['jpn', 'pol']": 48,
             "['ces', 'epo']": 4,
             "['oci', 'ron']": 4,
             "['lit', 'mkd']": 2,
             "['deu', 'ewe']": 2,
             "['cat', 'ron']": 4,
             "['ces', 'ita']": 18,
             "['est', 'fra']": 14,
             "['hin', 'mal']": 4})


df = pd.DataFrame(list(pairs.iteritems()), columns = ['column1','column2'])
print df
           column1  column2
0   ['por', 'rus']       80
1   ['est', 'fra']       14
2   ['nld', 'slv']        4
3   ['jpn', 'pol']       48
4   ['ces', 'epo']        4
5   ['hin', 'mal']        4
6   ['oci', 'ron']        4
7   ['lit', 'mkd']        2
8   ['deu', 'ewe']        2
9   ['cat', 'ron']        4
10  ['ces', 'ita']       18

print type(df.at[0,'column1'])
<type 'str'>

您可以先string listlist更改为DataFrame,然后创建column2 documented behaviour并上一次literal_eval #change type string to list df['column1'] = df['column1'].apply(literal_eval) print df column1 column2 0 [por, rus] 80 1 [est, fra] 14 2 [nld, slv] 4 3 [jpn, pol] 48 4 [ces, epo] 4 5 [hin, mal] 4 6 [oci, ron] 4 7 [lit, mkd] 2 8 [deu, ewe] 2 9 [cat, ron] 4 10 [ces, ita] 18 print type(df.at[0,'column1']) <type 'list'>

df1 = pd.DataFrame.from_records([x for x in df['column1']], columns=['a','b'])
print df1
      a    b
0   por  rus
1   est  fra
2   nld  slv
3   jpn  pol
4   ces  epo
5   hin  mal
6   oci  ron
7   lit  mkd
8   deu  ewe
9   cat  ron
10  ces  ita

print pd.concat([df1, df['column2']], axis=1)
      a    b  column2
0   por  rus       80
1   est  fra       14
2   nld  slv        4
3   jpn  pol       48
4   ces  epo        4
5   hin  mal        4
6   oci  ron        4
7   lit  mkd        2
8   deu  ewe        2
9   cat  ron        4
10  ces  ita       18
column1

或使用from_recordsconcat进行清理DataFrame,然后按str.stripreplace column2创建新的df['column1'] = df['column1'].str.strip('[]').str.replace("'","") print df column1 column2 0 por, rus 80 1 est, fra 14 2 nld, slv 4 3 jpn, pol 48 4 ces, epo 4 5 hin, mal 4 6 oci, ron 4 7 lit, mkd 2 8 deu, ewe 2 9 cat, ron 4 10 ces, ita 18

df1 = df['column1'].str.split(",", expand=True)
df1.columns = ['a','b']
print df1
      a     b
0   por   rus
1   est   fra
2   nld   slv
3   jpn   pol
4   ces   epo
5   hin   mal
6   oci   ron
7   lit   mkd
8   deu   ewe
9   cat   ron
10  ces   ita

print pd.concat([df1, df['column2']], axis=1)
      a     b  column2
0   por   rus       80
1   est   fra       14
2   nld   slv        4
3   jpn   pol       48
4   ces   epo        4
5   hin   mal        4
6   oci   ron        4
7   lit   mkd        2
8   deu   ewe        2
9   cat   ron        4
10  ces   ita       18
{{1}}

答案 2 :(得分:0)

此解决方案使用dictevalfrom_records函数进行理解:

import pandas as pd
from collections import defaultdict

pairs = defaultdict (int,
            {"['por', 'rus']": 80,
             "['nld', 'slv']": 4,
             "['jpn', 'pol']": 48,
             "['ces', 'epo']": 4,
             "['oci', 'ron']": 4,
             "['lit', 'mkd']": 2,
             "['deu', 'ewe']": 2,
             "['cat', 'ron']": 4,
             "['ces', 'ita']": 18,
             "['est', 'fra']": 14,
             "['hin', 'mal']": 4})

rec = [ eval(x[0]) + [x[1]] for x in pairs.iteritems()]
print rec
[['por', 'rus', 80], ['est', 'fra', 14], ['nld', 'slv', 4], ['jpn', 'pol', 48], 
 ['ces', 'epo', 4],  ['hin', 'mal', 4],  ['oci', 'ron', 4], ['lit', 'mkd', 2], 
 ['deu', 'ewe', 2],  ['cat', 'ron', 4],  ['ces', 'ita', 18]]

print pd.DataFrame.from_records(rec, columns=['a','b','c'])
      a    b   c
0   por  rus  80
1   est  fra  14
2   nld  slv   4
3   jpn  pol  48
4   ces  epo   4
5   hin  mal   4
6   oci  ron   4
7   lit  mkd   2
8   deu  ewe   2
9   cat  ron   4
10  ces  ita  18