Question

我有两个数据帧，数据帧A中的每一行都有对应于数据帧B中条目的索引列表和一组其他值。我想以某种方式加入两个数据帧，以使B中的每个条目在A中具有其他值，其中B中的条目索引位于索引列表中在A中的条目中。

到目前为止，我已经找到一种方法来提取B中每一行的索引列表的A中的行，但是只能从this answer中逐行提取，然后我不确定从这里去哪里？还不确定由于索引列表的大小会发生变化，因此动态处理Pandas是否有更好的方法。

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

print(A)
>>    indices a1   a2
>> 0  [0, 1]  a  100
>> 1  [2, 3]  b  200
>> 2  [4, 5]  c  300

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

print(B)
>>     b
>> 0  10
>> 1  20
>> 2  30
>> 3  40
>> 4  50
>> 5  60

# This is the desired output
out = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60],
    "a1": ["a","a", "b", "b", "c", "c"],
    "a2": [100,100,200,200,300,300]
})

print(out)
>>      b a1   a2
>> 0  10  a  100
>> 1  20  a  100
>> 2  30  b  200
>> 3  40  b  200
>> 4  50  c  300
>> 5  60  c  300

Answer 1

如果熊猫> = 0.25，则可以使用explode：

C = A.explode('indices')

这给出了：

  indices a1   a2
0       0  a  100
0       1  a  100
1       2  b  200
1       3  b  200
2       4  c  300
2       5  c  300

然后做：

output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values    
output.drop('indices', axis = 1, inplace = True)

最终输出：

    b a1   a2
0  10  a  100
1  20  a  100
2  30  b  200
3  40  b  200
4  50  c  300
5  60  c  300

Answer 2

使用import SvgUri from 'react-native-svg-uri'; import SVGICON from './icon.svg'; const WorkingComponent = () => { return ( <SvgUri width="24" height="24" svgXmlData={SVGICON} /> ); };

pd.merge

输出

df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)

df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)

pd.merge(B,df,left_index=True, right_index=True)

Answer 3

您在这里：

helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)

结果：

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

Answer 4

您也可以使用melt而不是堆栈，但是由于您必须删除不需要的列，因此更为复杂：

Python 3.7.3 (default, Mar 27 2019, 09:23:15) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.5.0
Python 3.7.3 (default, Mar 27 2019, 09:23:15) 
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)

   0  1 a1   a2
0  0  1  a  100
1  2  3  b  200
2  4  5  c  300

AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)

  a1   a2  val
0  a  100    0
1  b  200    2
2  c  300    4
3  a  100    1
4  b  200    3
5  c  300    5

pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)

Out[8]: 
  a1   a2   b
0  a  100  10
2  b  200  30
4  c  300  50
1  a  100  20
3  b  200  40
5  c  300  60

Answer 5

此解决方案将处理不同长度的索引。

A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
    .merge(A, left_index = True, right_index = True) \
    .drop(["indices"], axis = 1)\
    .melt(id_vars = ['a1', 'a2'], value_name = "index")\
    .drop("variable", axis = 1)\
    .dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)

最终输出：

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

根据其他数据框中的索引列表从其他数据框中向数据框中添加新列

5 个答案: