我有一本字典,我想将其变成一个数据框,然后将该数据框的 some 列合并为一列。
我的字典看起来像这样:
mydict = {'Participants': {'source': ['1', '2', '3'],
'name': ['A', 'B', 'C'],
'Entry (1)': ['Address1', 'Address2', 'Address3'],
'Entry (2)': ['Number1', 'Number2', 'Number2'],
'Entry (3)': ['Start1', 'Start2', 'Start3']},
'Countries': {'DK': ['1', '2', '3'],
'UK': ['1', '3', '2'],
'CDN': ['3', '2', '1'],
'FR': ['1', '2', '3']}}
,结果数据帧如下所示:
df = pd.DataFrame(mydict)
df:
Countries Participants
CDN [3, 2, 1] NaN
DK [1, 2, 3] NaN
Entry (1) NaN [Address1, Address2, Address3]
Entry (2) NaN [Number1, Number2, Number2]
Entry (3) NaN [Start1, Start2, Start3]
FR [1, 2, 3] NaN
UK [1, 3, 2] NaN
name NaN [A, B, C]
source NaN [1, 2, 3]
我有多个“条目(n)”列,其中包含每个参与者(df['Participants']['name']
)的“地址,号码和开始”信息。
现在,我需要一个附加的“条目”列,该列为每行合并Entry (1)
,Entry (2)
和Entry(3)
的信息。由于条目数(Entry (n)
随数据源的不同而变化,我需要获取如下条目数:
entries = re.findall(r'Entry \(\d\)', str(mydict['Participants'].keys()))
这给我留下了所有条目的列表:['Entry (1)', 'Entry (2)', 'Entry (3)']
。
最后我想拥有一个像这样的数据框:
Countries Participants
CDN [3, 2, 1] NaN
DK [1, 2, 3] NaN
Entry (1) NaN [Address1, Address2, Address3]
Entry (2) NaN [Number1, Number2, Number2]
Entry (3) NaN [Start1, Start2, Start3]
Entries Nan ['Address1\nNumber1\Start1', 'Address2\nNumber2\Start2', 'Address3\nNumber3\nStart3'] <<-- I need this
FR [1, 2, 3] NaN
UK [1, 3, 2] NaN
name NaN [A, B, C]
source NaN [1, 2, 3]
有人可以告诉我如何实现这一目标的熊猫特有方式吗?
答案 0 :(得分:3)
您似乎需要
s=pd.DataFrame(df.filter(like='Entry',axis=0).Participants.tolist()).apply('/n'.join).tolist()
df.loc['Entries','Participants']=s
df
Out[64]:
Participants Countries
CDN NaN [3, 2, 1]
DK NaN [1, 2, 3]
Entry (1) [Address1, Address2, Address3] NaN
Entry (2) [Number1, Number2, Number2] NaN
Entry (3) [Start1, Start2, Start3] NaN
FR NaN [1, 2, 3]
UK NaN [1, 3, 2]
name [A, B, C] NaN
source [1, 2, 3] NaN
Entries [Address1/nNumber1/nStart1, Address2/nNumber2/... NaN
请注意,您可以在末尾添加sort_index
答案 1 :(得分:2)
让我们尝试一下:
df.at ['Entries','Participants'] = ['\ n'.join(i)for in in(zip(* df.loc [['Entry(1)','Entry (2)','条目(3)'],'参与者']))]]
使用过滤器而不是索引列表从@ W-B解决方案中借来:
df.at['Entries','Participants'] = ['\n'.join(i) for i in (zip(*df.filter(like='Entry', axis=0)['Participants']))]
df.sort_index()
输出:
Participants Countries
CDN NaN [3, 2, 1]
DK NaN [1, 2, 3]
Entries [Address1\nNumber1\nStart1, Address2\nNumber2\... NaN
Entry (1) [Address1, Address2, Address3] NaN
Entry (2) [Number1, Number2, Number2] NaN
Entry (3) [Start1, Start2, Start3] NaN
FR NaN [1, 2, 3]
UK NaN [1, 3, 2]
name [A, B, C] NaN
source [1, 2, 3] NaN