使用熊猫将.txt文件分为两列

时间:2019-04-15 06:24:00

标签: pandas csv export-to-csv data-cleaning

我有一个脚本的文本文件,并按以下顺序排序:

0 "character one" "dialogue for character one."
1 "character two" "dialogue for character two." 
2 "character one" "dialogue for character one again"
...
etc

我的问题是我想分析此文本,并要求其为.csv格式,其中字符在第一列中,而对话框全部在第二列中。

我已经将.txt文件读入了熊猫,就像这样:

txt_ep_4 = pd.read_table('/Users/nathancahn/star_wars/0_data/ep_IV_script.txt') 所以现在我有一个熊猫数据系列(而不是数据框)要与之交互。

我主要尝试使用Series.str.split()将文本拆分为多个列的不同方法,但均未成功。我用series_txt_ep_4.str.split(pat=" ")表示在空格处分隔,但在每个空格处分隔。

同样,我的理想输出是将第一列作为字符名称,将第二列作为与该字符关联的对话字符串。

1 个答案:

答案 0 :(得分:2)

我相信您需要read_csv的参数sepnames作为新的列名,因为在pandas 0.24.2中得到:

  

未来警告:不建议使用read_table,请改用read_csv。

temp=u'''"character one" "dialogue for character one."
"character two" "dialogue for character two." 
"character one" "dialogue for character one again"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+", names=['a','b'])
#alternative
#df = pd.read_csv(pd.compat.StringIO(temp), delim_whitespace=True, names=['a','b'])
print (df)
               a                                 b
0  character one       dialogue for character one.
1  character two       dialogue for character two.
2  character one  dialogue for character one again

编辑:

如果值还具有标头:

temp=u""""character" "dialogue"
"1" "THREEPIO" "Did you hear that?  They've shut down the main reactor.  We'll be destroyed for sure.  This is madness!"
"2" "THREEPIO" "We're doomed!"
"3" "THREEPIO" "There'll be no escape for the Princess this time."
"4" "THREEPIO" "What's that?"
"5" "THREEPIO" "I should have known better than to trust the logic of a half-sized thermocapsulary dehousing assister..."
"6" "LUKE" "Hurry up!  Come with me!  What are you waiting for?!  Get in gear!"
"7" "THREEPIO" "Artoo! Artoo-Detoo, where are you?"
"8" "THREEPIO" "At last!  Where have you been?"
"9" "THREEPIO" "They're heading in this direction. What are we going to do?  We'll be sent to the spice mines of Kessel or smashed into who knows what!"
"10" "THREEPIO" "Wait a minute, where are you going?"
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+")

print (df)

   character                                           dialogue
1   THREEPIO  Did you hear that?  They've shut down the main...
2   THREEPIO                                      We're doomed!
3   THREEPIO  There'll be no escape for the Princess this time.
4   THREEPIO                                       What's that?
5   THREEPIO  I should have known better than to trust the l...
6       LUKE  Hurry up!  Come with me!  What are you waiting...
7   THREEPIO                 Artoo! Artoo-Detoo, where are you?
8   THREEPIO                     At last!  Where have you been?
9   THREEPIO  They're heading in this direction. What are we...
10  THREEPIO                Wait a minute, where are you going?