我很难重新组织这个数据帧。我想我应该使用pd.pivot_table
或pd.crosstab
,但我不确定如何完成工作。
这是我的DataFrame:
vicro = pd.read_csv(vicroURL)
vicro_subset = vicro.ix[:,['P1', 'P10', 'P30', 'P71', 'P82', 'P90']]
In [6]: vicro
vicro vicroURL vicro_subset
In [6]: vicro_subset.head()
Out[6]:
P1 P10 P30 P71 P82 P90
0 - I - - - M
1 - I - V T M
2 - I - V A M
3 - I - T - M
4 - - - - A -
我要做的是获取此数据框中的所有可能值并将它们分成行。新值将是计数。看起来像:
Out[6]:
P1 P10 P30 P71 P82 P90
I 0 4 0 0 0 0
V 0 0 0 2 0 0
A 0 0 0 0 2 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
任何帮助将不胜感激!谢谢。
编辑: 用熔化来阐述答案,这些都帮助我更多地理解了熊猫,但在“融化”答案中我有更多未知数:
In [8]: melted_df = pd.melt(vicro_subset)
In [9]: melted_df.head()
Out[9]:
variable value
0 P1 -
1 P1 -
2 P1 -
3 P1 -
4 P1 -
In [13]: grouped_melt = melted_df.groupby(['variable','value'])['value'].count()
In [14]: grouped_melt.head()
Out[14]:
variable value
P1 - 797
. 269
P10 - 339
. 1
F 132
In [15]: unstacked_group = grouped_melt.unstack()
In [16]: unstacked_group.head()
Out[16]:
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, P1 to P82
Data columns:
- 5 non-null values
. 2 non-null values
A 1 non-null values
AITV 1 non-null values
AT 2 non-null values
In [17]: transpose_unstack = unstacked_group.T
In [18]: transpose_unstack.head()
Out[18]:
variable P1 P10 P30 P71 P82 P90
value
- 797 339 1005 452 604 634
. 269 1 NaN NaN NaN NaN
A NaN NaN NaN NaN 282 NaN
AITV NaN NaN NaN NaN 1 NaN
AT NaN NaN NaN 1 2 NaN
答案 0 :(得分:5)
或者,这样的事情应该有效:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.random.randint(0,5,12).reshape(3,4),
columns=list('abcd'))
In [4]: print df
a b c d
0 2 2 3 1
1 0 1 0 2
2 1 3 0 4
In [5]: new = pd.concat([df[col].value_counts() for col in df.columns], axis=1)
In [6]: new.columns = df.columns
In [7]: print new
a b c d
0 1 NaN 2 NaN
1 1 1 NaN 1
2 1 1 NaN 1
3 NaN 1 1 NaN
4 NaN NaN NaN 1
答案 1 :(得分:2)
我认为关键是使用melt
,然后使用一些杂技。所以这是你的DataFrame:
In [21]: df
Out[21]:
P1 P10 P30 P71 P82 P90
0 - I - - - M
1 - I - V T M
2 - I - V A M
3 - I - T - M
4 - - - - A -
现在,如果您执行以下操作(您可能希望在IPython中执行以查看中间结果):
In [22]: pd.melt(df).groupby(['variable', 'value'])['value'].count().unstack().T
.fillna(0)
Out[22]:
variable P1 P10 P30 P71 P82 P90
value
- 5 1 5 2 2 1
A 0 0 0 0 2 0
I 0 4 0 0 0 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
V 0 0 0 2 0 0
假设您将结果保存在df2
中,然后可以删除' - '行:
In [25]: df2.drop('-')
Out[25]:
variable P1 P10 P30 P71 P82 P90
value
A 0 0 0 0 2 0
I 0 4 0 0 0 0
M 0 0 0 0 0 4
T 0 0 0 1 1 0
V 0 0 0 2 0 0