Question

这是我的数据框（其中authors列中的值是以逗号分隔的字符串）：

authors            book

Jim, Charles       The Greatest Book in the World
Jim                An OK book
Charlotte          A book about books
Charlotte, Jim     The last book

如何将其转换为长格式，如下所示：

authors            book

Jim                The Greatest Book in the World
Jim                An OK book
Jim                The last book
Charles            The Greatest Book in the World
Charlotte          A book about books
Charlotte          The last book

我尝试将各个作者提取到列表authors = list(df['authors'].str.split(','))，展平该列表，将每个作者与每本书匹配，并在每次匹配时构建新的词典列表。但这对我来说似乎并不是pythonic，我猜猜大熊猫有更清洁的方法来做到这一点。

Answer 1

您可以在为图书设置索引之后逐列拆分作者，这样几乎可以让您完成所有工作。重命名和排序列以完成。

df.set_index('book').authors.str.split(',', expand=True).stack().reset_index('book')

                             book          0
0  The Greatest Book in the World        Jim
1  The Greatest Book in the World    Charles
0                      An OK book        Jim
0              A book about books  Charlotte
0                   The last book  Charlotte
1                   The last book        Jim

让你一路回家

df.set_index('book')\
  .authors.str.split(',', expand=True)\
  .stack()\
  .reset_index('book')\
  .rename(columns={0:'authors'})\
  .sort_values('authors')[['authors', 'book']]\
  .reset_index(drop=True)

Answer 2

最好的选择是使用.str.split，然后.explode列表
- 在', '上分割，否则逗号后面的值将以空格开头（例如' Charles'）

import pandas as pd

data = {'authors': ['Jim, Charles', 'Jim', 'Charlotte', 'Charlotte, Jim'], 'book': ['The Greatest Book in the World', 'An OK book', 'A book about books', 'The last book']}

df = pd.DataFrame(data)

# display(df)
          authors                            book
0    Jim, Charles  The Greatest Book in the World
1             Jim                      An OK book
2       Charlotte              A book about books
3  Charlotte, Jim                   The last book

# split authors
df.authors = df.authors.str.split(', ')

# explode the column
df = df.explode('authors').reset_index(drop=True)

# display(df)
     authors                            book
0        Jim  The Greatest Book in the World
1    Charles  The Greatest Book in the World
2        Jim                      An OK book
3  Charlotte              A book about books
4  Charlotte                   The last book
5        Jim                   The last book

将逗号分隔值提取到pandas中的各个行

2 个答案: