我有一个csv文件,该文件的列包含很多句子,我用“。”分隔了这些句子,我想将分开的句子放入新的csv文件的不同行中。以下是我的一些代码。
import csv
import pandas as pd
excel= pd.read_csv("file.csv", encoding = "ISO-8859-1")
excel.dropna(inplace = True)
split = pd.DataFrame(excel["months_readmore_story"].str.split('.'), columns=['sentences'])
split.to_csv('split.csv')
我尝试了上面的代码,但是新的csv文件中没有任何内容。这是来自原始csv文件file.csv
file.csv
id date months_readmore_story
1 sep 20 England. The weather caused a lots of uproar.
2 Aug 10 Health. Health have been an issue.
我想要在split.csv中输出
split.csv
story_id sentences_id sentences
1 1 England
1 2 The weather caused a lots of uproar
2 3 Health
2 4 Health have been an issue
答案 0 :(得分:1)
希望这会起作用。假设您的原始数据帧为df
import pandas as pd
import numpy as np
new_df = pd.DataFrame(df.months_readmore_story.str.split('.').tolist(),index=df.id).stack()
new_df = new_df.reset_index([0, 'id'])
new_df.columns = ['story_id', 'sentences']
new_df['sentences'].replace('', np.nan, inplace=True)
new_df.dropna(subset=['sentences'], inplace=True)
new_df.insert(1,"sentences_id",range(1,(new_df.shape[0]+1)))
答案 1 :(得分:0)
尝试一下
df['temp'] =df[' months_readmore_story'].str.rstrip('.').str.split('.').values.tolist()
df= df[['id', 'temp']]
df.set_index('id')['temp'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'temp'})
O / P:
id temp
0 1 England
1 1 The weather caused a lots of uproar
0 2 Health
1 2 Health have been an issue