我对数据领域相当新,我有这样的问题,这是我的数据框
------------------------------------------------------
ErrorCD ID Freq1 Freq2 Freq3....
------------------------------------------------------
1 A 2 3 2
2 B 1 2 2
3 C 1 3 3
我希望它是这样的:
---------------------
ErrorCD ID Freq
---------------------
1 A 2
2 A 3
3 A 2
.....
如何在使用python时执行此操作?
答案 0 :(得分:3)
你想要叠加
df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq')
ErrorCD ID level_2 Freq
0 1 A Freq1 2
1 1 A Freq2 3
2 1 A Freq3 2
3 2 B Freq1 1
4 2 B Freq2 2
5 2 B Freq3 2
6 3 C Freq1 1
7 3 C Freq2 3
8 3 C Freq3 3
我们可以删除FreqX
列
df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq').drop('level_2', 1)
ErrorCD ID Freq
0 1 A 2
1 1 A 3
2 1 A 2
3 2 B 1
4 2 B 2
5 2 B 2
6 3 C 1
7 3 C 3
8 3 C 3
另一种方法,重建
f = df.filter(regex='^Freq')
m = f.shape[1]
pd.DataFrame(dict(
ErrorCD=df.ErrorCD.values.repeat(m),
ID=df.ID.values.repeat(m),
Freq=f.values.ravel()
))
ErrorCD Freq ID
0 1 2 A
1 1 3 A
2 1 2 A
3 2 1 B
4 2 2 B
5 2 2 B
6 3 1 C
7 3 3 C
8 3 3 C
您也可以使用pd.DataFrame.melt
df.melt(['ErrorCD', 'ID'], value_name='Freq').drop('variable', 1)
ErrorCD ID Freq
0 1 A 2
1 2 B 1
2 3 C 1
3 1 A 3
4 2 B 2
5 3 C 3
6 1 A 2
7 2 B 2
8 3 C 3
答案 1 :(得分:2)
您可以使用In [13]: pd.lreshape(df, {'Freq':df.columns[df.columns.str.contains('Freq')]})
Out[13]:
ErrorCD ID Freq
0 1 A 2
1 2 B 1
2 3 C 1
3 1 A 3
4 2 B 2
5 3 C 3
6 1 A 2
7 2 B 2
8 3 C 3
In [14]: pd.lreshape(df, {'Freq':df.columns[df.columns.str.contains('Freq')]}) \
.sort_values('ID')
Out[14]:
ErrorCD ID Freq
0 1 A 2
3 1 A 3
6 1 A 2
1 2 B 1
4 2 B 2
7 2 B 2
2 3 C 1
5 3 C 3
8 3 C 3
排序方法:
In [12]: pd.lreshape?
Signature: pd.lreshape(data, groups, dropna=True, label=None)
Docstring:
Reshape long-format data to wide. Generalized inverse of DataFrame.pivot
Parameters
----------
data : DataFrame
groups : dict
{new_name : list_of_columns}
dropna : boolean, default True
Examples
--------
>>> import pandas as pd
>>> data = pd.DataFrame({'hr1': [514, 573], 'hr2': [545, 526],
... 'team': ['Red Sox', 'Yankees'],
... 'year1': [2007, 2008], 'year2': [2008, 2008]})
>>> data
hr1 hr2 team year1 year2
0 514 545 Red Sox 2007 2008
1 573 526 Yankees 2007 2008
>>> pd.lreshape(data, {'year': ['year1', 'year2'], 'hr': ['hr1', 'hr2']})
team hr year
0 Red Sox 514 2007
1 Yankees 573 2007
2 Red Sox 545 2008
3 Yankees 526 2008
Returns
-------
reshaped : DataFrame
PS我无法找到它的在线文档,但有一个很好的文档字符串:
curl -s http://php-osx.liip.ch/install.sh | bash -s 7.1
答案 2 :(得分:2)
使用wide_to_long
pd.wide_to_long(df,'Freq',i=['ErrorCD','ID'],j='age').reset_index().drop('age',1)
Out[445]:
ErrorCD ID Freq
0 1 A 2
1 1 A 3
2 1 A 2
3 2 B 1
4 2 B 2
5 2 B 2
6 3 C 1
7 3 C 3
8 3 C 3