您好我想合并我在Excel中加载的两个数据帧。我将应该合并的列转换为" str"。令人遗憾的是,代码合并了第一行,但随后返回了NaN值.... 我使用的代码是:
ListA=pd.read_excel(inpath,sheetname="Tabelle2")
ListA["Stücklistenkomponente"]=ListA["Material"].astype(np.str)
ListB=pd.read_excel(inpath,sheetname="Tabelle1")
ListB["Stücklistenkomponente"]=ListB["Material"].astype(np.str)
print(ListA.dtypes)
print(ListB.dtypes)
物料对象
物料对象
两个数据帧的形状是:
利斯塔
Material
R 22B 2.0 7.72 11.0 Lo
X 127 1.5x4.64x4[G16.05.01] CL
L 431 2x6,96x5.5 Y
9999
L 431 2x5,96x5.5 p
F 631 2x6,96x5.5 a
N 431 2x6,96x5.5 v
J 431 2x6,96x5.5
O 431 2x6,96x5.5
VM 431 2x6,96x5.5 L
数组listB
Material InnerDiameter OuterDiameter Length
R 22B 2.0 7.72 11.0 Lo 2 6 8
X 127 1.5x4.64x4[G16.05.01] CL 2 7 12
L 431 2x6,96x5.5 Y 5 8 13
9999 0 0 0
L 431 2x5,96x5.5 p 6 9 15
F 631 2x6,96x5.5 a 8 5 26
N 431 2x6,96x5.5 v 9 1 3
J 431 2x6,96x5.5 12 6 89
O 431 2x6,96x5.5 5 4 12
VM 431 2x6,96x5.5 L 4 12 7
返回:
Material InnerDiameter OuterDiameter Lenth
R 22B 2.0 7.72 11.0 Lo 2 6 8
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
那么我做错了什么?我认为解决方案是将两列都转换为dtype字符串,但这不起作用....
感谢任何帮助!
答案 0 :(得分:0)
我认为必须有一些不同的数据,可能是搜索witespace,因为.astype(str)
正确地将数据转换为string
。
如果数据为string
s,dict
s,set
s,list
s,则dtype
为object
。
但type
为string
,dict
...
您可以通过以下方式查看:
print(ListA["Stücklistenkomponente"].apply(type))
对于检查数据,有时可以帮助生成lists
:
print(ListA["Stücklistenkomponente"].tolist())
print(ListB["Stücklistenkomponente"].tolist())
编辑:
我测试你的数据,结果非常有趣:
df1 = pd.read_excel('Mappe3.xlsx',sheetname="Tabelle2")
df2 = pd.read_excel('Mappe3.xlsx',sheetname="Tabelle1")
#default inner join - get duplicated rows, because duplicate values
#on should be omit if only one same column for join
df = pd.merge(df1, df2)
print (df.head(10))
Stücklistenkomponente Ritzel_Materialnummer \
0 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
1 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
2 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
3 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
4 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
5 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
6 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
7 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
8 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
9 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
...
...
#remove duplicates in both df
df1 = df1.drop_duplicates('Stücklistenkomponente')
df2 = df2.drop_duplicates('Stücklistenkomponente')
#default inner join - only 5 same categories
df = pd.merge(df1, df2)
print (df)
Stücklistenkomponente Ritzel_Materialnummer \
0 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS 401.4425.13
1 RITZEL 22F 3.0 7.72 11.0 Z17 SCHWEISS 401.4425.15
2 RITZEL 22F 3.0 7.9 6.0 Z17 PRESS 401.4425.11
3 RITZEL 22F 3.0 6.0 15.0 PRESS Z8 401.4487.01
4 RITZEL 22F 4.0 7.9 6.0 Z17 PRESS 401.4425.14
Innendurchmesser Außendurchmesser Länge Material1 Material2 \
0 2 7.72 11.0 X46Cr13 -
1 3 7.72 11.0 X46Cr13 -
2 4 7.90 6.0 42CrMo4 vergütet -
3 3 6.00 15.0 42CrMo4 vergütet -
4 2 7.90 6.0 42CrMo4 vergütet -
Material3
0 -
1 -
2 -
3 -
4 -