根据分隔符将dataframe列拆分为两列

时间:2017-09-11 23:40:19

标签: python pandas dataframe split delimiter

我正在预处理分类文本,我导入我的数据集如下:

dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)

dataset在终端上打印:

                                 lyrics,classification
0    I should have known better with a girl like yo...
1    You can shake an apple off an apple tree\nShak...
2    It's been a hard day's night\nAnd I've been wo...
3    Michelle, ma belle\nThese are words that go to...

然而,当我使用dataset更接近检查变量spyder时,我发现我只有一列,而不是所需的两列。

enter image description here

考虑到歌词本身有逗号和“,”分隔符不起作用,

如何更正上面的数据框,以便:

1)lyrics

的一列

2)classification

的一列

每行有相应的数据吗?

1 个答案:

答案 0 :(得分:1)

如果您的歌词本身不包含逗号(他们很可能会这样做),那么您可以read_csv使用delimiter=','

但是,如果这不是一个选项,您可以使用str.rsplit

dataset.iloc[:, 0].str.rsplit(',', expand=True)
df

                               lyrics,classification
0  I should have known better with a girl like yo...
1                              You can shake an...,0
2                  It's been a hard day's night...,0

df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
df.columns = ['lyrics', 'classification']
df

                                              lyrics classification
0  I should have known better with a girl like yo...              0
1                                You can shake an...              0
2                    It's been a hard day's night...              0