我有一个来自网络爬虫的列表,该列表使日志文件以垂直列表的形式出现。
示例:
21-Oct-19 14:46:14 - Retrieving data from https://www.finn.no/bap/forsale/search.html?category=0.93&page=1&product_category=2.93.3904.69&sub_category=1.93.3904
0 21-Oct-19 14:46:14 - Found:
1 Title: Nesten ubrukt Canon 17-40 mm vidvinkell...
2 Price: 4�900 kr
3 Link: https://www.finn.no/bap/forsale/ad.html?...
4 21-Oct-19 14:46:14 - Found:
5 Title: Nesten ubrukt Canon 17-40 mm vidvinkell...
6 Price: 4�900 kr
7 Link: https://www.finn.no/bap/forsale/ad.html?...
8 21-Oct-19 14:46:14 - Found:
9 Title: Nesten ubrukt Canon 17-40 mm vidvinkell...
10 Price: 4�900 kr
11 Link: https://www.finn.no/bap/forsale/ad.html?...
12 21-Oct-19 14:46:14 - Found:
13 Title: Nesten ubrukt Canon 17-40 mm vidvinkell...
我可以将其转换为熊猫的可读数据框吗?
示例:
title price link
canon 100mm 6900kr https
canon 50mm 100r https
canon 17mm 63530kr https
我的代码现在看起来像这样:
import pandas as pd
data = pd.read_csv('finn.no-2019-10-21-.log', sep ="Line", engine='python')
df = pd.DataFrame(data)
title = 1,5,9,13,17,21
price = 2,6,10,14,18,22
link = 3,7,11,15,19,23
print(df)
我可以对原始行中的数字执行任何操作以转换为更传统的数据帧吗?
答案 0 :(得分:2)
这应该为您做到:
with open('finn.no-2019-10-21-.log') as f:
lines = f.readlines()
clean = [line.strip() for line in lines]
title = [j.split('Title: ')[1] for j in clean if j.startswith('Title: ')]
price = [k.split('Price: ')[1] for k in clean if k.startswith('Price: ')]
link = [l.split('Link: ')[1] for l in clean if l.startswith('Link: ')]
df = pd.DataFrame(data=[title, price, link], columns=['Title', 'Price', 'Link'])
答案 1 :(得分:0)
在@zipa的帮助下,我说对了:
import pandas as pd
with open('finn.no-2019-10-22-.log') as f:
lines = f.readlines()
clean = [line.strip() for line in lines]
titles = [j.split('Title: ')[1] for j in clean if j.startswith('Title: ')]
prices = [k.split('Price: ')[1] for k in clean if k.startswith('Price: ')]
links = [l.split('Link: ')[1] for l in clean if l.startswith('Link: ')]
output = []
for title, price, link in zip(titles, prices, links):
articles = {}
articles['titles'] = title
articles['prices'] = price
articles['links'] = link
output.append(articles)
df = pd.DataFrame(data=output)
print(df)