使用python的数据格式?

时间:2017-10-09 19:38:25

标签: python pandas dataframe anaconda unpivot

我对数据领域相当新,我有这样的问题,这是我的数据框

------------------------------------------------------
ErrorCD    ID    Freq1      Freq2     Freq3....
------------------------------------------------------
1          A      2          3           2
2          B      1          2           2
3          C      1          3           3

我希望它是这样的:

---------------------
ErrorCD  ID    Freq
---------------------
1        A      2
2        A      3
3        A      2
.....

如何在使用python时执行此操作?

3 个答案:

答案 0 :(得分:3)

你想要叠加

df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq')

   ErrorCD ID level_2  Freq
0        1  A   Freq1     2
1        1  A   Freq2     3
2        1  A   Freq3     2
3        2  B   Freq1     1
4        2  B   Freq2     2
5        2  B   Freq3     2
6        3  C   Freq1     1
7        3  C   Freq2     3
8        3  C   Freq3     3

我们可以删除FreqX

df.set_index(['ErrorCD', 'ID']).stack().reset_index(name='Freq').drop('level_2', 1)

   ErrorCD ID  Freq
0        1  A     2
1        1  A     3
2        1  A     2
3        2  B     1
4        2  B     2
5        2  B     2
6        3  C     1
7        3  C     3
8        3  C     3

另一种方法,重建

f = df.filter(regex='^Freq')
m = f.shape[1]
pd.DataFrame(dict(
    ErrorCD=df.ErrorCD.values.repeat(m),
    ID=df.ID.values.repeat(m),
    Freq=f.values.ravel()
))

   ErrorCD  Freq ID
0        1     2  A
1        1     3  A
2        1     2  A
3        2     1  B
4        2     2  B
5        2     2  B
6        3     1  C
7        3     3  C
8        3     3  C

您也可以使用pd.DataFrame.melt

df.melt(['ErrorCD', 'ID'], value_name='Freq').drop('variable', 1)

   ErrorCD ID  Freq
0        1  A     2
1        2  B     1
2        3  C     1
3        1  A     3
4        2  B     2
5        3  C     3
6        1  A     2
7        2  B     2
8        3  C     3

答案 1 :(得分:2)

您可以使用In [13]: pd.lreshape(df, {'Freq':df.columns[df.columns.str.contains('Freq')]}) Out[13]: ErrorCD ID Freq 0 1 A 2 1 2 B 1 2 3 C 1 3 1 A 3 4 2 B 2 5 3 C 3 6 1 A 2 7 2 B 2 8 3 C 3

In [14]: pd.lreshape(df, {'Freq':df.columns[df.columns.str.contains('Freq')]}) \
           .sort_values('ID')
Out[14]:
   ErrorCD ID  Freq
0        1  A     2
3        1  A     3
6        1  A     2
1        2  B     1
4        2  B     2
7        2  B     2
2        3  C     1
5        3  C     3
8        3  C     3

排序方法:

In [12]: pd.lreshape?
Signature: pd.lreshape(data, groups, dropna=True, label=None)
Docstring:
Reshape long-format data to wide. Generalized inverse of DataFrame.pivot

Parameters
----------
data : DataFrame
groups : dict
    {new_name : list_of_columns}
dropna : boolean, default True

Examples
--------
>>> import pandas as pd
>>> data = pd.DataFrame({'hr1': [514, 573], 'hr2': [545, 526],
...                      'team': ['Red Sox', 'Yankees'],
...                      'year1': [2007, 2008], 'year2': [2008, 2008]})
>>> data
   hr1  hr2     team  year1  year2
0  514  545  Red Sox   2007   2008
1  573  526  Yankees   2007   2008

>>> pd.lreshape(data, {'year': ['year1', 'year2'], 'hr': ['hr1', 'hr2']})
      team   hr  year
0  Red Sox  514  2007
1  Yankees  573  2007
2  Red Sox  545  2008
3  Yankees  526  2008

Returns
-------
reshaped : DataFrame

PS我无法找到它的在线文档,但有一个很好的文档字符串:

curl -s http://php-osx.liip.ch/install.sh | bash -s 7.1

答案 2 :(得分:2)

使用wide_to_long

pd.wide_to_long(df,'Freq',i=['ErrorCD','ID'],j='age').reset_index().drop('age',1)
Out[445]: 
   ErrorCD ID  Freq
0        1  A     2
1        1  A     3
2        1  A     2
3        2  B     1
4        2  B     2
5        2  B     2
6        3  C     1
7        3  C     3
8        3  C     3