合并两个具有列表的数据集,并在使用熊猫合并后保留列表

时间:2020-06-30 13:08:04

标签: python pandas list merge

我有两个很难合并的数据框:

const qs = require('qs')
const url = `https://api.routexl.nl/tour`;
const locations = this.makeLocations(tasks,trip);
const params = qs.stringify({
  skipOptimisation: true,
  locations:locations
})
const request = new Request(url, {
  method: 'POST',
  headers: {
    'Authorization': 'Basic authToken',
    'Content-Type': 'application/x-www-form-urlencoded'
  },
  body: params
})
const response = await fetch(request)
console.info("Response:", response)

输出:

    df1 = pd.DataFrame({'id': [ ["001",  "001"], ["001"], ["007",   "001"]]})

    id
    0   [001, 001]
    1   [001]
    2   [007, 001]

输出:

df2 = pd.DataFrame({'id': [ "001", "007"],'name': ['Name01', 'Name02']})

我想到达的是这个

id  name
0   001 Name01
1   007 Name02

输出:

df3 = pd.DataFrame({'id':  [ ["001",  "001"], ["001"], ["007",   "01"]],
                    'name': [ ['Name01','Name01'], ['Name01'], ['Name02', 'Name01']]})

我的问题是我可以合并,但是我无法以所需的格式输入。我现在所拥有的是这里:

    id  name
0   [001, 001]  [Name01, Name01]
1   [001]   [Name01]
2   [007, 01]   [Name02, Name01]

输出:

pd.DataFrame(df2.merge(df1.explode('id'), on= 'id')).groupby('id').agg(lambda x: x.tolist())

2 个答案:

答案 0 :(得分:3)

在列表理解中使用mapping创建的字典使用df2,它应该比explode更快,并聚合list,这是真实数据中的最佳测试。

d = df2.set_index('id')['name'].to_dict()
df1['name'] = [[d[y] for y in x if y in d] for x in df1['id']]
print (df1)
           id              name
0  [001, 001]  [Name01, Name01]
1       [001]          [Name01]
2  [007, 001]  [Name02, Name01]

答案 1 :(得分:3)

我们可以做explode + merge

df1=df1.explode('id').reset_index().merge(df2,how='left').groupby('index').agg(list)
               id              name
index                              
0      [001, 001]  [Name01, Name01]
1           [001]          [Name01]
2      [007, 001]  [Name02, Name01]

或者只是map并分配

df1['name']=df1.id.explode().map(df2.set_index('id').name).groupby(level=0).agg(list)
0    [Name01, Name01]
1            [Name01]
2    [Name02, Name01]
Name: id, dtype: object