Title: Canon EF 100mm f/2.8L Macro IS USM
Price: 6�900 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=161065896
21-Oct-19 10:21:14 - Found:
Title: Canon EF 100mm f/2.8L Macro IS USM
Price: 7�500 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=155541389
21-Oct-19 10:21:14 - Found:
Title: Panasonic Lumix G 25mm F1.4 ASPH
Price: 3�200 kr
Link: https://www.finn.no/bap/forsale/ad.html?finnkode=161066674
我想导入此数据并将其发送给excel
title price link
canon 100mm 6900kr https
答案 0 :(得分:0)
如果日志文件的显示顺序不正确,则需要更改方法。由于以下功能将始终开始查找“标题”,“价格”和“链接”文本并添加到列表中。要转换为数据帧,所有列表的长度必须相等。让我知道它是否有效。
def log_to_frame(location="./datalake/file.log"):
with open(location, mode='r', encoding='UTF-8') as f:
title_list = []
price_list = []
link_list = []
for line in f:
if "Title" in line:
title = line.split(": ")[1].rstrip()
title_list.append(title)
elif "Price" in line:
price = line.split(": ")[1].replace("�", "").rstrip()
price_list.append(title)
elif "Link" in line:
link = line.split(": ")[1].rstrip()
link_list.append(title)
else:
pass
main_df = pd.DataFrame({"title": title_list, "price": price_list, "link": link_list})
return main_df
log_df = log_to_frame()
log_df.to_excel("log.xlsx", index=False)
答案 1 :(得分:0)
您可以将数据作为普通表加载到DataFrame中,然后使用DataFrame的log
和reset_index
函数合并列。
假设每行上只有一个“:”符号,将“键”列与“值”列分开,并且每个“记录”的每个键都有一行。
import pandas as pd
p = pd.read_table("table.log", sep=':', header=None)
df = pd.DataFrame()
keys = set(p[0]) # set of all unique keys
for key in keys:
# get all values with the current key and re-index them from 0...n
col_data = p.loc[p[0]==key][1].reset_index(drop=True)
# put this in a new column named after the key
df[key] = col_data