我有一个包含一些数据的文本文件。我需要将我的文本文件拆分为数据框。这是我的文本文件:
2012/02/03 18:55:54 SampleClass1 verb detail for id 19471668
verb detail for id 185289
verb detail for id 185289
verb detail for id 1852849
2012/03/03 18:55:54 SampleClass8 detail for id 2181536
2012/04/03 18:55:54 SampleClass1 verb detail for id 1765383670
2012/05/03 18:55:54 SampleClass9 verb detail for id 1666944491
2012/06/03 18:55:54 SampleClass8 detail for id 799914029 verb detail for id 185229
我想分别拆分日期和时间以及一些字符串,然后我需要将其转换为数据框。
我的预期输出:
date time desc
2012/02/03 18:55:54 SampleClass9 verb detail for id 1947166588
verb detail for id 185289
verb detail for id 185289
verb detail for id 1852849
2012/03/03 18:55:54 SampleClass8 detail for id 218851536
verb detail for id 1852829
verb detail for id 185289
verb detail for id 1852849
2012/04/03 18:55:54 SampleClass1 verb detail for id 1765383670
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 1852849
2012/05/03 18:55:54 SampleClass9 verb detail for id 1666944491
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 18528429
2012/06/03 18:55:54 SampleClass8 detail for id 799914029 verb detail for id 1852844029
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 18528429
答案 0 :(得分:1)
根据您输入的数据,以下代码可以完成工作。
import csv
import pandas as pd
file = "/path/to/file/"
# Open CSV file
with open(file, "r", newline="") as fp:
# Read the text file and use a space delimiter
reader = csv.reader(fp, delimiter=" ")
rows = []
# loop through the rows
for row in reader:
# if empty row then continue
if not row:
continue
#if the first character of the row is a number join the columns after
# column 2, as columns one and two are already separated
elif row[0][0].isdigit():
rows.append(row[:2]+ [' '.join(row[2:])])
# else add two columns and then join the columns
else:
rows.append(['','']+ [' '.join(row)])
df = pd.DataFrame(rows, columns=['date','time','desc'])