我有两个具有数千行的数据框,我需要将两个数据框组合成一个数据框,而无需重复和还原。例如:
数据框1
drug1
drug2
drug3
数据框2
disease1
disease2
disease3
因此,输出数据帧将为:
输出数据帧
drug1 disease1
drug1 disease2
drug1 disease3
drug2 disease1
drug2 disease2
drug2 disease3
drug3 disease1
drug3 disease2
drug3 disease3
我不想要包含以下内容的输出组合:
disease1 drug1
drug1 drug1
disease1 disease1
我实际上使用pd.merge
进行了尝试,但是它返回重复和还原,并且还花费了很长时间,因为我在数据框1和2中有成千上万个
请帮忙吗?
答案 0 :(得分:2)
仅在pandas
中使用的一种方法是创建一个MultiIndex from product,然后将其转换为数据框:
>>> df1
0
0 drug1
1 drug2
2 drug3
>>> df2
0
0 disease1
1 disease2
2 disease3
df3 = (pd.MultiIndex.from_product([df1[0],df2[0]])
.to_frame()
.reset_index(drop=True))
>>> df3
0 1
0 drug1 disease1
1 drug1 disease2
2 drug1 disease3
3 drug2 disease1
4 drug2 disease2
5 drug2 disease3
6 drug3 disease1
7 drug3 disease2
8 drug3 disease3
答案 1 :(得分:1)
public class MyRecyclerViewActivity extends AppCompatActivity {
RecyclerView recyclerView;
@Override
protected void onCreate(@Nullable Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.my_recycler_view);
DatabaseReference databaseReference = FirebaseDatabase.getInstance().getReference();
databaseReference.addChildEventListener(new ChildEventListener() {
@Override // indivivdual items at the db ref
public void onChildAdded(DataSnapshot dataSnapshot, String s) {
String fileName = dataSnapshot.getKey(); //returns file name
String url = dataSnapshot.getValue(String.class); //returns url for file name
((MyAdapter) recyclerView.getAdapter()).update(fileName,url);
}
@Override
public void onChildChanged(DataSnapshot dataSnapshot, String s) {
}
@Override
public void onChildRemoved(DataSnapshot dataSnapshot) {
}
@Override
public void onChildMoved(DataSnapshot dataSnapshot, String s) {
}
@Override
public void onCancelled(DatabaseError databaseError) {
}
});
recyclerView = findViewById(R.id.recyclerView);
recyclerView.setLayoutManager(new LinearLayoutManager(MyRecyclerViewActivity.this));
MyAdapter myAdapter = new MyAdapter(recyclerView,MyRecyclerViewActivity.this, new ArrayList<String>(), new ArrayList<String>());
recyclerView.setAdapter(myAdapter);
}
}
df1 = pd.DataFrame(dict(col1=[f"drug{i}" for i in range(1, 4)]))
df2 = pd.DataFrame(dict(col2=[f"disease{i}" for i in range(1, 4)]))
在分配的列上merge
df1.assign(A=1).merge(df2.assign(A=1)).drop('A', 1)
col1 col2
0 drug1 disease1
1 drug1 disease2
2 drug1 disease3
3 drug2 disease1
4 drug2 disease2
5 drug2 disease3
6 drug3 disease1
7 drug3 disease2
8 drug3 disease3
pd.DataFrame([
(i, j) for i in df1.col1
for j in df2.col2
], columns=['col1', 'col2'])
归纳为任意两个数据框的叉积
pandas.concat
答案 2 :(得分:1)
尝试以下解决方案:
i = df1.index.repeat(len(df2))
j = np.tile(df2.index, len(df1))
pd.concat([
df1.loc[i].reset_index(drop=True),
df2.loc[j].reset_index(drop=True)
], sort=True, axis=1)