我有一个数据框,我通过列EQ_NR
与另一个数据框合并。
这是第一个数据帧的结构:(行:320816)
FAK_ART FAK_DAT LEIST_DAT KD_CRM MW_BW EQ_NR \
0 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
1 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
2 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
3 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
4 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
5 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
6 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363
7 ZPAF 2015-12-10 2015-12-31 T-HOME ICP E 1001380594
8 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380594
MATERIAL KW_WERT NETTO_EURO TA
0 B60ETS 0.15 18.9 SDH
1 B60ETS 0.145 18.27 SDH
2 B60ETS 0.145 18.27 NaN
3 B60ETS 0.15 18.9 SDH
4 B60ETS 0.15 18.9 NaN
5 B60ETS 0.145 18.27 SDH
6 B60ETS 0.15 18.9 SDH
7 B60ETS 3.011 252.92 DSLAM/MSAN
8 B60ETS 3.412 429.91 DSLAM/MSAN
这是第二个:(行:135818)
EQ_NR TA
0 1001380363 SONSTIGES
1 1001380363 NaN
2 1001380363 Sonstiges
3 1000943704 Sonstiges
4 1000943823 Sonstiges
5 1000943985 Sonstiges
6 1000954774 FMED
7 1000954790 FMED
8 1001380363 SDH
9 1000955097 NaN
合并后我有一个数据帧与' TA'通过' EQ_NR'从第二个数据框添加到第一个数据框。
问题是我在合并之前有320816行,在合并两个数据帧之后有320871行。可能会发生的行数比基本数据多55个?
我需要数据进行一些计算,而另外55行会扭曲计算结果......
答案 0 :(得分:1)
加入专栏EQ_NR
时出现重复问题。
在示例中,重复的值为1001380363
和1001380594
。
样品:
import pandas as pd
df1 = pd.DataFrame({'EQ_NR':[1001380363,1001380363,1001380363, 1001380365],
'B':[4,5,6,7],
'C':[7,8,9,7]})
print (df1)
B C EQ_NR
0 4 7 1001380363
1 5 8 1001380363
2 6 9 1001380363
3 7 7 1001380365
df2 = pd.DataFrame({'EQ_NR':[1001380363,1001380363,1001380363,1001380363],
'B':[4,5,6,8],
'C':[7,8,9,3]})
print (df2)
B C EQ_NR
0 4 7 1001380363
1 5 8 1001380363
2 6 9 1001380363
3 8 3 1001380363
print (pd.merge(df1, df2, on=['EQ_NR']))
B_x C_x EQ_NR B_y C_y
0 4 7 1001380363 4 7
1 4 7 1001380363 5 8
2 4 7 1001380363 6 9
3 4 7 1001380363 8 3
4 5 8 1001380363 4 7
5 5 8 1001380363 5 8
6 5 8 1001380363 6 9
7 5 8 1001380363 8 3
8 6 9 1001380363 4 7
9 6 9 1001380363 5 8
10 6 9 1001380363 6 9
11 6 9 1001380363 8 3
EDIT1:
如果数据框df2
在EQ_NR
中没有重复数据,请使用:
print (df1)
FAK_ART FAK_DAT LEIST_DAT KD_CRM MW_BW EQ_NR MATERIAL \
0 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
1 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
2 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
3 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
4 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
5 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
6 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
7 ZPAF 2015-12-10 2015-12-31 T-HOME ICP E 1001380594 B60ETS
8 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380594 B60ETS
KW_WERT NETTO_EURO TA
0 0.150 18.90 SDH
1 0.145 18.27 SDH
2 0.145 18.27 NaN
3 0.150 18.90 SDH
4 0.150 18.90 NaN
5 0.145 18.27 SDH
6 0.150 18.90 SDH
7 3.011 252.92 DSLAM/MSAN
8 3.412 429.91 DSLAM/MSAN
print (df2)
EQ_NR TA
0 1001380363 Sonstiges
1 1000943704 Sonstiges
2 1000943823 Sonstiges
3 1000943985 Sonstiges
4 1000954774 FMED
5 1000954790 FMED
6 1000955097 NaN
print (pd.merge(df1, df2, on=['EQ_NR'], how='left', suffixes=('','_new')))
FAK_ART FAK_DAT LEIST_DAT KD_CRM MW_BW EQ_NR MATERIAL \
0 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
1 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
2 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
3 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
4 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
5 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
6 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS
7 ZPAF 2015-12-10 2015-12-31 T-HOME ICP E 1001380594 B60ETS
8 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380594 B60ETS
KW_WERT NETTO_EURO TA TA_new
0 0.150 18.90 SDH Sonstiges
1 0.145 18.27 SDH Sonstiges
2 0.145 18.27 NaN Sonstiges
3 0.150 18.90 SDH Sonstiges
4 0.150 18.90 NaN Sonstiges
5 0.145 18.27 SDH Sonstiges
6 0.150 18.90 SDH Sonstiges
7 3.011 252.92 DSLAM/MSAN NaN
8 3.412 429.91 DSLAM/MSAN NaN
答案 1 :(得分:0)
如果您只想添加一列,可以使用map()方法:
In [290]: df1['TA2'] = df1.EQ_NR.map(df2.set_index('EQ_NR').TA)
In [291]: df1
Out[291]:
FAK_ART FAK_DAT LEIST_DAT KD_CRM MW_BW EQ_NR MATERIAL KW_WERT NETTO_EURO TA TA2
0 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.150 18.90 SDH AAA
1 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.145 18.27 SDH AAA
2 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.145 18.27 NaN AAA
3 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.150 18.90 SDH AAA
4 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.150 18.90 NaN AAA
5 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.145 18.27 SDH AAA
6 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380363 B60ETS 0.150 18.90 SDH AAA
7 ZPAF 2015-12-10 2015-12-31 T-HOME ICP E 1001380594 B60ETS 3.011 252.92 DSLAM/MSAN NaN
8 ZPAF 2015-12-10 2015-12-31 T-HOME ICP B 1001380594 B60ETS 3.412 429.91 DSLAM/MSAN NaN
其中df2:
In [288]: df2
Out[288]:
EQ_NR TA
0 1001380363 AAA
注意:df2.EQ_NR
必须是唯一的,否则您将获得InvalidIndexError: Reindexing only valid with uniquely valued Index objects
例外。 df1.EQ_NR
可能有重复...