我有以下具有大量行的数据帧。我想把多列并将其压缩到一列。
Player | 0 | 1 | 2 | 3 | 4
Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl
Tony Gonzalez | 1st Tm All-Conf. | AP 1st Tm | None | None | None
... | ... | ... | ... | ... | ...
我正在试图弄清楚如何重组它,所以奖项都在一栏中。所以它看起来像一个数据帧如下:
Player | awardID
Edgerrin James | 1st Tm All-Conf.
Edgerrin James | AP 1st Tm
Edgerrin James | FW 1st Tm
Edgerrin James | SN 1st Tm
Edgerrin James | Pro Bowl
Tony Gonzalez | 1st Tm All-Conf.
Tony Gonzalez | AP 1st Tm
如果还包括“无”单元格,我会很好,因为我知道如何过滤掉那些,但无法弄清楚第一部分。
答案 0 :(得分:2)
在set_index
和Player
stack
In [750]: df.set_index('Player').stack().reset_index(name='awardID').drop('level_1', 1)
Out[750]:
Player awardID
0 Edgerrin James 1st Tm All-Conf.
1 Edgerrin James AP 1st Tm
2 Edgerrin James FW 1st Tm
3 Edgerrin James SN 1st Tm
4 Edgerrin James Pro Bowl
5 Tony Gonzalez 1st Tm All-Conf.
6 Tony Gonzalez AP 1st Tm
7 Tony Gonzalez None
8 Tony Gonzalez None
9 Tony Gonzalez None
选择性地,使用None
query
In [757]: (df.set_index('Player')
.stack()
.reset_index(name='awardID')
.drop('level_1', 1)
.query('awardID != "None"'))
Out[757]:
Player awardID
0 Edgerrin James 1st Tm All-Conf.
1 Edgerrin James AP 1st Tm
2 Edgerrin James FW 1st Tm
3 Edgerrin James SN 1st Tm
4 Edgerrin James Pro Bowl
5 Tony Gonzalez 1st Tm All-Conf.
6 Tony Gonzalez AP 1st Tm
答案 1 :(得分:1)
没有熊猫的解决方案
首先保存字符串中的任何行,如s
def mylist(string):
string = string.split('|')
length = len(string)-1
for i in range(length):
print string[0],string[i+1:i+2],'\n'
s1 = 'Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl'
s2 = 'Tony Gonzalez | 1st Tm All-Conf. | AP 1st Tm | None | None | None'
mylist(s1)
mylist(s2)
输出中:
Edgerrin James [' 1st Tm All-Conf. ']
Edgerrin James [' AP 1st Tm ']
Edgerrin James [' FW 1st Tm ']
Edgerrin James [' SN 1st Tm ']
Tony Gonzalez [' 1st Tm All-Conf. ']
Tony Gonzalez [' AP 1st Tm ']
Tony Gonzalez [' None ']
Tony Gonzalez [' None ']
Tony Gonzalez [' None']
为所有玩家和行执行此操作