我有一个脚本的文本文件,并按以下顺序排序:
0 "character one" "dialogue for character one."
1 "character two" "dialogue for character two."
2 "character one" "dialogue for character one again"
...
etc
我的问题是我想分析此文本,并要求其为.csv格式,其中字符在第一列中,而对话框全部在第二列中。
我已经将.txt文件读入了熊猫,就像这样:
txt_ep_4 = pd.read_table('/Users/nathancahn/star_wars/0_data/ep_IV_script.txt')
所以现在我有一个熊猫数据系列(而不是数据框)要与之交互。
我主要尝试使用Series.str.split()将文本拆分为多个列的不同方法,但均未成功。我用series_txt_ep_4.str.split(pat=" ")
表示在空格处分隔,但在每个空格处分隔。
同样,我的理想输出是将第一列作为字符名称,将第二列作为与该字符关联的对话字符串。
答案 0 :(得分:2)
我相信您需要read_csv
的参数sep
和names
作为新的列名,因为在pandas 0.24.2
中得到:
未来警告:不建议使用read_table,请改用read_csv。
temp=u'''"character one" "dialogue for character one."
"character two" "dialogue for character two."
"character one" "dialogue for character one again"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+", names=['a','b'])
#alternative
#df = pd.read_csv(pd.compat.StringIO(temp), delim_whitespace=True, names=['a','b'])
print (df)
a b
0 character one dialogue for character one.
1 character two dialogue for character two.
2 character one dialogue for character one again
编辑:
如果值还具有标头:
temp=u""""character" "dialogue"
"1" "THREEPIO" "Did you hear that? They've shut down the main reactor. We'll be destroyed for sure. This is madness!"
"2" "THREEPIO" "We're doomed!"
"3" "THREEPIO" "There'll be no escape for the Princess this time."
"4" "THREEPIO" "What's that?"
"5" "THREEPIO" "I should have known better than to trust the logic of a half-sized thermocapsulary dehousing assister..."
"6" "LUKE" "Hurry up! Come with me! What are you waiting for?! Get in gear!"
"7" "THREEPIO" "Artoo! Artoo-Detoo, where are you?"
"8" "THREEPIO" "At last! Where have you been?"
"9" "THREEPIO" "They're heading in this direction. What are we going to do? We'll be sent to the spice mines of Kessel or smashed into who knows what!"
"10" "THREEPIO" "Wait a minute, where are you going?"
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+")
print (df)
character dialogue
1 THREEPIO Did you hear that? They've shut down the main...
2 THREEPIO We're doomed!
3 THREEPIO There'll be no escape for the Princess this time.
4 THREEPIO What's that?
5 THREEPIO I should have known better than to trust the l...
6 LUKE Hurry up! Come with me! What are you waiting...
7 THREEPIO Artoo! Artoo-Detoo, where are you?
8 THREEPIO At last! Where have you been?
9 THREEPIO They're heading in this direction. What are we...
10 THREEPIO Wait a minute, where are you going?