我有一个数据框(我的真实数据框有5万行和34列):
df = pd.DataFrame({
'NAME': ['APPLE COMPANY A', 'BANANA COMPANY B', 'ORANGE COMPANY C', 'APPLE COMPANY A'],
'INVESTMENTS': ['OIL LTD', 'GOLD LTD', 'GAS LTD', 'GAS LTD'],
'STOCKS' : [100, 200, 300, 400],
'OIL LTD': [0, 0, 0, 0],
'GOLD LTD': [0, 0, 0, 0],
'GAS LTD': [0, 0, 0, 0],
})
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 0 0 0
1 BANANA COMPANY B GOLD LTD 200 0 0 0
2 ORANGE COMPANY C GAS LTD 300 0 0 0
3 APPLE COMPANY A GAS LTD 400 0 0 0
如何基于STOCKS
和列名中的值来查找NAME
列中的值?例如,对于列OIL LTD
中的第一个值,它在列{{1}中搜索APPLE COMPANY A
列中的NAME
和OIL LTD
(基于具有相同名称的列) },其值为INVESTMENTS
,可以在下面看到。因此,它基于100
和OIL LTD
中的值从列名GOLD LTD
,GAS LTD
,NAME
等中搜索值。
我希望输出看起来像这样:
INVESTMENTS
如果我想查找一个值,通常会使用 NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 0 0 400
,但不确定是否可以使用两个值。它可以与Excel一起使用,但是每列运行该函数需要15分钟,效率不高。
答案 0 :(得分:1)
如果仅由0
解决方案填充的最后一列为pivot
,则删除列并最后加入:
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS GAS LTD GOLD LTD OIL LTD
0 APPLE COMPANY A OIL LTD 100 400 0 100
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 300 0 0
3 APPLE COMPANY A GAS LTD 400 400 0 100
如果需要像原始DataFrame中一样的列顺序:
cols = df.columns.drop(['NAME','INVESTMENTS','STOCKS'])
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)[cols]
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 100 0 400