Question

我有一个csv文件，该文件的列包含很多句子，我用“。”分隔了这些句子，我想将分开的句子放入新的csv文件的不同行中。以下是我的一些代码。

import csv
import pandas as pd

excel= pd.read_csv("file.csv", encoding = "ISO-8859-1")
excel.dropna(inplace = True) 
split = pd.DataFrame(excel["months_readmore_story"].str.split('.'), columns=['sentences'])

split.to_csv('split.csv')

我尝试了上面的代码，但是新的csv文件中没有任何内容。这是来自原始csv文件file.csv

file.csv
id      date         months_readmore_story
1        sep 20       England. The weather caused a lots of uproar.
2        Aug 10       Health. Health have been an issue.

我想要在split.csv中输出

split.csv
story_id        sentences_id      sentences
 1               1                 England
 1               2                 The weather caused a lots of uproar
 2               3                 Health
 2               4                 Health have been an issue

Answer 1

希望这会起作用。假设您的原始数据帧为df

import pandas as pd
import numpy as np

new_df = pd.DataFrame(df.months_readmore_story.str.split('.').tolist(),index=df.id).stack()
new_df = new_df.reset_index([0, 'id'])
new_df.columns = ['story_id', 'sentences']
new_df['sentences'].replace('', np.nan, inplace=True)
new_df.dropna(subset=['sentences'], inplace=True)
new_df.insert(1,"sentences_id",range(1,(new_df.shape[0]+1)))

Answer 2

尝试一下

df['temp'] =df[' months_readmore_story'].str.rstrip('.').str.split('.').values.tolist()
df= df[['id', 'temp']]
df.set_index('id')['temp'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'temp'})

O / P：

   id                                  temp
0   1                               England
1   1   The weather caused a lots of uproar
0   2                                Health
1   2             Health have been an issue

如何将我已经拆分为csv文件中不同行的csv列中的数据放入

2 个答案: