我有一个带有时间戳记的文本文件,它看起来像这样:
00:25
hold it miles lunch and remember I'm
00:30
working late tonight again man you're a
00:34
total slave to that business of yours
00:36
nobody's a slave to their own dream
我试图弄清楚如何将其导入到Pandas Dataframe中,如下所示:
[Time] [Text]
00:25 hold it miles lunch and remember I'm
00:30 working late tonight again man you're a
00:34 total slave to that business of yours
00:36 nobody's a slave to their own dream
我很尴尬地说我什至不知道从哪里开始...我知道并尝试过的所有方法都会产生这种结果:
row1 00:25
row2 hold it miles lunch and remember I'm
row3 00:30
row4 working late tonight again man you're a
row5 00:34
row6 total slave to that business of yours
row7 00:36
row8 nobody's a slave to their own dream
我发现了这个question,看起来似乎是相同的问题,但是我无法告诉您在创建数据框时如何应用它。
谢谢您的帮助!
答案 0 :(得分:4)
这里是实现此目的的方法:
# Import the sample data
data='''00:25
hold it miles lunch and remember I'm
00:30
working late tonight again man you're a
00:34
total slave to that business of yours
00:36
nobody's a slave to their own dream'''
# Create a list containing every line
data = data.split('\n')
# Parse the data, assigning every other row to a different column
col1 = [data[i] for i in range(0,len(data),2)]
col2 = [data[i] for i in range(1,len(data),2)]
# Create the data frame
df = pd.DataFrame({'Time': col1, 'Text': col2})
print(df)
Time Text
0 00:25 hold it miles lunch and remember I'm
1 00:30 working late tonight again man you're a
2 00:34 total slave to that business of yours
3 00:36 nobody's a slave to their own dream
答案 1 :(得分:2)
或者(如果text
列中没有:
)
m=df.col.str.contains(":")
df_new=pd.concat([df[m].reset_index(drop=True),df[~m].reset_index(drop=True)],axis=1)
df_new.columns=['Time','Text']
print(df_new)
Time Text
0 00:25 hold it miles lunch and remember I'm
1 00:30 working late tonight again man you're a
2 00:34 total slave to that business of yours
3 00:36 nobody's a slave to their own dream
答案 2 :(得分:2)
另一种方法是拆分每一行,并将其他行分配给不同的列,例如“时间”和“文本”。最后,根据修改后的字典使其成为DataFrame。
import pandas as pd
# Read your files here
files = ['text.txt'] # you can add file or bunch of files
data = {}
for f in files:
with open (f, "r") as myfile:
all_lines = myfile.read().splitlines() # split by line
# assign every alternative line to Time and Text index alternatively
data['Time'], data['Text'] = all_lines[::2], all_lines[1::2]
# create dataframe from the dictionary
df = pd.DataFrame(data)
print(df)
输出:
Time Text
0 00:25 hold it miles lunch and remember I'm
1 00:30 working late tonight again man you're a
2 00:34 total slave to that business of yours
3 00:36 nobody's a slave to their own dream