Question

我的数据框如下

d = {'Movie' : ['The Shawshank Redemption', 'The Godfather'],
        'FirstName1': ['Tim', 'Marlon'],
        'FirstName2': ['Morgan', 'Al'],
        'LastName1': ['Robbins', 'Brando'],
        'LastName2': ['Freeman', 'Pacino'],
        'ID1': ['TM', 'MB'],
        'ID2': ['MF', 'AP']
        }
df = pd.DataFrame(d)
df

我想将其重新排列成4列数据框，通过将Firstname1, LastName1, FirstName2, LastName2, ID1, ID2转换为FirstName, LastName, ID的3列行，则列movie重复如下。

在sql中，我们按如下操作

select Movie as Movie, FirstName1 as FirstName, LastName1 as LastName, ID1 as ID from table
union
select Movie as Movie, FirstName2 as FirstName, LastName2 as LastName, ID2 as ID from table

我们可以使用熊猫吗？

Answer 1

如果列名中的数字可能更像9，请使用Series.str.extract获取具有到列MultiIndex之前的值的整数，因此可能是DataFrame.stack：

df = df.set_index('Movie')
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)')
df.columns = pd.MultiIndex.from_arrays([df1[0], df1[1].astype(int)])

df = df.rename_axis((None, None), axis=1).stack().reset_index(level=1, drop=True).reset_index()
print (df)
                      Movie FirstName  ID LastName
0  The Shawshank Redemption       Tim  TM  Robbins
1  The Shawshank Redemption    Morgan  MF  Freeman
2             The Godfather    Marlon  MB   Brando
3             The Godfather        Al  AP   Pacino

如果不使用索引获取列名的所有前一个值的最后一个值，并传递给MultiIndex.from_arrays：

df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[:-1], df.columns.str[-1].astype(int)])
df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
                      Movie FirstName  ID LastName
0  The Shawshank Redemption       Tim  TM  Robbins
1  The Shawshank Redemption    Morgan  MF  Freeman
2             The Godfather    Marlon  MB   Brando
3             The Godfather        Al  AP   Pacino

Answer 2

df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_tuples([(col[:-1], col[-1:]) for col in df.columns])

df.stack()

#                           FirstName  ID LastName
#Movie                                            
#The Shawshank Redemption 1       Tim  TM  Robbins
#                         2    Morgan  MF  Freeman
#The Godfather            1    Marlon  MB   Brando
#                         2        Al  AP   Pacino

使用MultiIndex的强大功能！使用from_tuples，您可以创建一个DataFrame，其中有一个用于FirstNames的列，分为FirstName1和FirstName2（请参见下文），而ID和LastName则类似。使用stack，您可以将其分别转换成行。在执行此操作之前，请使Movie成为索引，以将其从您的操作中排除。您可以使用reset_index()重新获得所有内容作为列，但是我不确定是否需要。

在stack之前：

#                         FirstName         LastName           ID    
#                                 1       2        1        2   1   2
#Movie                                                               
#The Shawshank Redemption       Tim  Morgan  Robbins  Freeman  TM  MF
#The Godfather               Marlon      Al   Brando   Pacino  MB  AP

Answer 3

我认为一个简单的方法是使用Pandas的复制功能。您可以将“电影”，“名字”，“姓氏”，“ ID”列复制到新表中。然后在第一列中删除不需要的列。您也可以为另一个创建一个新表。

new = d['Movie', 'FirstName', 'LastName', 'ID].copy

Answer 4

尝试以下方法：

d1 = df.filter(regex="1$|Movie").rename(columns=lambda x: x[:-1])
d2 = df.filter(regex="2$|Movie").rename(columns=lambda x: x[:-1])
pd.concat([d1, d2]).rename({'Movi':'Movie'})

Python数据框：将列转换为行

4 个答案: