用这个:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3)
我以这种方式打印我的数据集:
lyrics,classification
0 "I should have known better with a girl like you
1 That I would love everything that you do
2 And I do, hey hey hey, and I do
3 Whoa, whoa, I
4 Never realized what I kiss could be
5 This could only happen to me
6 Can't you see, can't you see
7 That when I tell you that I love you, oh
8 You're gonna say you love me too, hoo, hoo, ho...
9 And when I ask you to be mine
10 You're gonna say you love me too
11 So, oh I never realized what I kiss could be
12 Whoa whoa I never realized what I kiss could be
13 You love me too
14 You love me too",0
但我真正需要的是每行""
之间的所有内容。如何在pandas
中进行此转换?
答案 0 :(得分:1)
在问题来源(read_csv
)中解决问题:
@nbeuchat可能是对的,只需尝试
dataset = pd.read_csv('lyrics.csv', quoting = 2)
那应该给你一个包含一行和两列的数据框:歌词(在字符串中有嵌入的行返回)和分类(0)。
您想使用pd.Series.str.cat:
import pandas as pd
dataset = pd.DataFrame({'lyrics':pd.Series(['happy birthday to you',
'happy birthday to you',
'happy birthday dear outkast',
'happy birthday to you'])})
dataset['lyrics'].str.cat(sep=' / ')
# 'happy birthday to you / happy birthday to you / happy birthday dear outkast / happy birthday to you'
默认sep
为None
,会为您提供'happy birthday to youhappy birthday to youhappy ...'
,因此请选择适合您的sep
值。上面我使用了斜线(用空格填充),因为这是你通常在歌曲和诗歌的引用中看到的。
您还可以尝试print(dataset['lyrics'].str.cat(sep='\n'))
来维护换行符,但是将它们全部存储在一个字符串中,而不是每行一个字符串。