我有一个大型的pandas数据框,包含8列和几个NaN
值:
0 1 2 3 4 5 6 7 8
1 Google, Inc. (Date 11/07/2016) NaN NaN NaN NaN NaN NaN NaN NaN
2 Apple Inc. (Date 07/01/2016) Amazon (Date 11/01/2016) NaN NaN NaN NaN NaN NaN NaN
3 IBM, Inc. (Date 11/08/2016) NaN NaN NaN NaN NaN NaN NaN NaN
4 Microsoft (Date 11/10/2016) Google, Inc. (Date 11/10/1990) Google, Inc. (Date 11/07/2016) Samsung (Date 05/02/2016) NaN NaN NaN NaN NaN
我怎样才能像这样压扁它:
0 companies
1 Google, Inc. (Date 11/07/2016)
2 Apple Inc. (Date 07/01/2016)
3 Amazon (Date 11/01/2016)
4 IBM, Inc. (Date 11/08/2016)
5 Microsoft (Date 11/10/2016)
6 Google, Inc. (Date 11/10/1990)
7 Google, Inc. (Date 11/07/2016)
8 Samsung (Date 05/02/2016)
我读了docs并尝试了:
df.iloc[:,0]
问题是我丢失了其他列的信息和订单。我想到如何平坦而不会丢失其他单元格和顺序中的数据?
答案 0 :(得分:2)
您可以堆叠列并选择性地重置索引。默认情况下,堆栈会丢弃NaN&#39。s。
df.stack()
Out:
0 0 Google, Inc. (Date 11/07/2016)
1 0 Apple Inc. (Date 07/01/2016)
1 Amazon (Date 11/01/2016)
2 0 IBM, Inc. (Date 11/08/2016)
3 0 Microsoft (Date 11/10/2016)
1 Google, Inc. (Date 11/10/1990)
2 Google, Inc. (Date 11/07/2016)
3 Samsung (Date 05/02/2016)
dtype: object
df.stack().reset_index(drop=True)
Out:
0 Google, Inc. (Date 11/07/2016)
1 Apple Inc. (Date 07/01/2016)
2 Amazon (Date 11/01/2016)
3 IBM, Inc. (Date 11/08/2016)
4 Microsoft (Date 11/10/2016)
5 Google, Inc. (Date 11/10/1990)
6 Google, Inc. (Date 11/07/2016)
7 Samsung (Date 05/02/2016)
dtype: object
答案 1 :(得分:1)
这可能就是诀窍:
df = pd.DataFrame([
["Google, Inc. (Date 11/07/2016)", float("NaN")],
["Apple Inc. (Date 07/01/2016)", "Amazon (Date 11/01/2016)"]])
unstacked = df.T.unstack()
unstacked.dropna(inplace=True)
unstacked.reset_index(drop=True, inplace=True)
unstacked
输出:
0 Google, Inc. (Date 11/07/2016)
1 Apple Inc. (Date 07/01/2016)
2 Amazon (Date 11/01/2016)
dtype: object
P.S。请查看this question关于在问题中提供好的熊猫示例。