我的原始问题已发布here。我的数据框如下:
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
我想将其转换为此DataFrame:
ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA
pivot_table
转换后,我收到了一个DataFrame,标题后面还有一个空白行:
test_2['SEQ1'] = test_2.SEQ
test_2 = test_2.pivot_table(index= ['ID','SEQ1']).unstack()
test_2 = test_2.sort_index(axis=1, level=1)
test_2.columns = ['_'.join((col[0], str(col[1]))) for col in test_2]
test_2
test_2
START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
ID
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA
如何删除这些行并对齐所有标题?我尝试使用test2[:2]
以常规方式删除行,但它没有删除空行。
编辑:
这是更真实的数据集提取:
ID INDEX START END SEQ NUM_PREV NUM_ACTUAL NUM_NEXT TIME PRE_TIME LOC_IND
079C 333334.0 2016-06-23 12:45:32 2016-06-23 12:51:05 1 1 23456 25456 29456 30 2 YES
079C 333334.0 2016-06-23 12:47:05 2016-06-23 12:51:05 2 2 29456 39458 39945 20 0 NO
答案 0 :(得分:1)
考虑在pivot / unstack操作后重置索引:
from io import StringIO
import pandas as pd
data='''
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
'''
test_2 = pd.read_table(StringIO(data), sep="\\s+")
seq = set(test_2['SEQ'].tolist())
test_2['SEQ1'] = test_2.SEQ
test_2 = test_2.pivot_table(index= ['ID','SEQ1']).unstack()
test_2 = test_2.sort_index(axis=1, level=1)
test_2.columns = ['_'.join((col[0], str(col[1]))) for col in test_2]
test_2 = test_2.reset_index()
# ID END_1 SEQ_1 START_1 END_2 SEQ_2 START_2 END_3 SEQ_3 START_3
# 0 1 12.0 1.0 11.0 14.0 2.0 13.0 15.0 3.0 14.0
# 1 2 14.0 1.0 10.0 NaN NaN NaN NaN NaN NaN
# 2 3 15.0 1.0 11.0 17.0 2.0 16.0 NaN NaN NaN
但是,正如您所看到的那样,它会更改列排序,因此请考虑使用sum()
嵌套列表解析来展平它,所有这些都是为了合适的顺序:
seqmax = max(seq)+1
colorder = ['ID'] + sum([['START_'+str(i),'END_'+str(i),'SEQ_'+str(i)]
for i in range(1, seqmax) if i in seq],[])
test_2 = test_2[colorder]
# ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
# 0 1 11.0 12.0 1.0 13.0 14.0 2.0 14.0 15.0 3.0
# 1 2 10.0 14.0 1.0 NaN NaN NaN NaN NaN NaN
# 2 3 11.0 15.0 1.0 16.0 17.0 2.0 NaN NaN NaN