熊猫 - 将许多行合并为一个

时间:2017-09-11 17:18:14

标签: python pandas

用这个:

dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3)

我以这种方式打印我的数据集:

                                   lyrics,classification
0       "I should have known better with a girl like you
1               That I would love everything that you do
2                        And I do, hey hey hey, and I do
3                                          Whoa, whoa, I
4                    Never realized what I kiss could be
5                           This could only happen to me
6                           Can't you see, can't you see
7               That when I tell you that I love you, oh
8      You're gonna say you love me too, hoo, hoo, ho...
9                          And when I ask you to be mine
10                      You're gonna say you love me too
11          So, oh I never realized what I kiss could be
12       Whoa whoa I never realized what I kiss could be
13                                       You love me too
14                                    You love me too",0

但我真正需要的是每行""之间的所有内容。如何在pandas中进行此转换?

1 个答案:

答案 0 :(得分:1)

适用于OP的解决方案(来自评论):

在问题来源(read_csv)中解决问题:

  

@nbeuchat可能是对的,只需尝试

     

dataset = pd.read_csv('lyrics.csv', quoting = 2)

     

那应该给你一个包含一行和两列的数据框:歌词(在字符串中有嵌入的行返回)和分类(0)。

折叠系列字符串的一般解决方案:

您想使用pd.Series.str.cat

import pandas as pd

dataset = pd.DataFrame({'lyrics':pd.Series(['happy birthday to you',
                                            'happy birthday to you',
                                            'happy birthday dear outkast',
                                            'happy birthday to you'])})    
dataset['lyrics'].str.cat(sep=' / ')   
# 'happy birthday to you / happy birthday to you / happy birthday dear outkast / happy birthday to you'

默认sepNone,会为您提供'happy birthday to youhappy birthday to youhappy ...',因此请选择适合您的sep值。上面我使用了斜线(用空格填充),因为这是你通常在歌曲和诗歌的引用中看到的。

您还可以尝试print(dataset['lyrics'].str.cat(sep='\n'))来维护换行符,但是将它们全部存储在一个字符串中,而不是每行一个字符串。