我有两个数据帧,数据帧A
中的每一行都有对应于数据帧B
中条目的索引列表和一组其他值。我想以某种方式加入两个数据帧,以使B
中的每个条目在A
中具有其他值,其中B
中的条目索引位于索引列表中在A
中的条目中。
到目前为止,我已经找到一种方法来提取B
中每一行的索引列表的A
中的行,但是只能从this answer中逐行提取,然后我不确定从这里去哪里?还不确定由于索引列表的大小会发生变化,因此动态处理Pandas是否有更好的方法。
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
print(A)
>> indices a1 a2
>> 0 [0, 1] a 100
>> 1 [2, 3] b 200
>> 2 [4, 5] c 300
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
print(B)
>> b
>> 0 10
>> 1 20
>> 2 30
>> 3 40
>> 4 50
>> 5 60
# This is the desired output
out = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60],
"a1": ["a","a", "b", "b", "c", "c"],
"a2": [100,100,200,200,300,300]
})
print(out)
>> b a1 a2
>> 0 10 a 100
>> 1 20 a 100
>> 2 30 b 200
>> 3 40 b 200
>> 4 50 c 300
>> 5 60 c 300
答案 0 :(得分:4)
如果熊猫> = 0.25,则可以使用explode:
C = A.explode('indices')
这给出了:
indices a1 a2
0 0 a 100
0 1 a 100
1 2 b 200
1 3 b 200
2 4 c 300
2 5 c 300
然后做:
output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values
output.drop('indices', axis = 1, inplace = True)
最终输出:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
答案 1 :(得分:2)
使用import SvgUri from 'react-native-svg-uri';
import SVGICON from './icon.svg';
const WorkingComponent = () => {
return (
<SvgUri
width="24"
height="24"
svgXmlData={SVGICON}
/>
);
};
pd.merge
输出
df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)
df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)
pd.merge(B,df,left_index=True, right_index=True)
答案 2 :(得分:1)
您在这里:
helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)
结果:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300
答案 3 :(得分:1)
您也可以使用melt而不是堆栈,但是由于您必须删除不需要的列,因此更为复杂:
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.5.0
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
import pandas as pd
import numpy as np
# Inputs
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)
0 1 a1 a2
0 0 1 a 100
1 2 3 b 200
2 4 5 c 300
AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)
a1 a2 val
0 a 100 0
1 b 200 2
2 c 300 4
3 a 100 1
4 b 200 3
5 c 300 5
pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)
Out[8]:
a1 a2 b
0 a 100 10
2 b 200 30
4 c 300 50
1 a 100 20
3 b 200 40
5 c 300 60
答案 4 :(得分:1)
此解决方案将处理不同长度的索引。
A = pd.DataFrame.from_dict({
"indices": [[0,1],[2,3],[4,5]],
"a1": ["a","b","c"],
"a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
.merge(A, left_index = True, right_index = True) \
.drop(["indices"], axis = 1)\
.melt(id_vars = ['a1', 'a2'], value_name = "index")\
.drop("variable", axis = 1)\
.dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
"b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)
最终输出:
b a1 a2
0 10 a 100
1 20 a 100
2 30 b 200
3 40 b 200
4 50 c 300
5 60 c 300