无重复和还原的两个数据框组合,高效蟒蛇

时间:2018-08-07 20:22:08

标签: python pandas dataframe combinations

我有两个具有数千行的数据框,我需要将两个数据框组合成一个数据框,而无需重复和还原。例如:

数据框1

drug1
drug2
drug3

数据框2

disease1
disease2
disease3

因此,输出数据帧将为:

输出数据帧

drug1 disease1
drug1 disease2
drug1 disease3
drug2 disease1
drug2 disease2
drug2 disease3 
drug3 disease1
drug3 disease2
drug3 disease3

我不想要包含以下内容的输出组合:

disease1 drug1
drug1 drug1
disease1 disease1 

我实际上使用pd.merge进行了尝试,但是它返回重复和还原,并且还花费了很长时间,因为我在数据框1和2中有成千上万个

请帮忙吗?

3 个答案:

答案 0 :(得分:2)

仅在pandas中使用的一种方法是创建一个MultiIndex from product,然后将其转换为数据框:

>>> df1
       0
0  drug1
1  drug2
2  drug3
>>> df2
          0
0  disease1
1  disease2
2  disease3

df3 = (pd.MultiIndex.from_product([df1[0],df2[0]])
       .to_frame()
       .reset_index(drop=True))

>>> df3
       0         1
0  drug1  disease1
1  drug1  disease2
2  drug1  disease3
3  drug2  disease1
4  drug2  disease2
5  drug2  disease3
6  drug3  disease1
7  drug3  disease2
8  drug3  disease3

答案 1 :(得分:1)

设置

public class MyRecyclerViewActivity extends AppCompatActivity {
RecyclerView recyclerView;

@Override
protected void onCreate(@Nullable Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.my_recycler_view);

    DatabaseReference databaseReference = FirebaseDatabase.getInstance().getReference();


    databaseReference.addChildEventListener(new ChildEventListener() {
        @Override // indivivdual items at the db ref
        public void onChildAdded(DataSnapshot dataSnapshot, String s) {

            String fileName = dataSnapshot.getKey(); //returns file name
            String url = dataSnapshot.getValue(String.class); //returns url for file name

            ((MyAdapter) recyclerView.getAdapter()).update(fileName,url);


        }

        @Override
        public void onChildChanged(DataSnapshot dataSnapshot, String s) {

        }

        @Override
        public void onChildRemoved(DataSnapshot dataSnapshot) {

        }

        @Override
        public void onChildMoved(DataSnapshot dataSnapshot, String s) {

        }

        @Override
        public void onCancelled(DatabaseError databaseError) {

        }
    });

    recyclerView = findViewById(R.id.recyclerView);

    recyclerView.setLayoutManager(new LinearLayoutManager(MyRecyclerViewActivity.this));

    MyAdapter myAdapter = new MyAdapter(recyclerView,MyRecyclerViewActivity.this, new ArrayList<String>(), new ArrayList<String>());
    recyclerView.setAdapter(myAdapter);

}
}

df1 = pd.DataFrame(dict(col1=[f"drug{i}" for i in range(1, 4)])) df2 = pd.DataFrame(dict(col2=[f"disease{i}" for i in range(1, 4)])) 在分配的列上

merge

理解力

df1.assign(A=1).merge(df2.assign(A=1)).drop('A', 1)

    col1      col2
0  drug1  disease1
1  drug1  disease2
2  drug1  disease3
3  drug2  disease1
4  drug2  disease2
5  drug2  disease3
6  drug3  disease1
7  drug3  disease2
8  drug3  disease3

pd.DataFrame([ (i, j) for i in df1.col1 for j in df2.col2 ], columns=['col1', 'col2'])

归纳为任意两个数据框的叉积

pandas.concat

答案 2 :(得分:1)

尝试以下解决方案:

i = df1.index.repeat(len(df2))
j = np.tile(df2.index, len(df1))

pd.concat([
    df1.loc[i].reset_index(drop=True),
    df2.loc[j].reset_index(drop=True)
], sort=True, axis=1)