我编写了一个脚本,用于从reddit获取一些帖子。
import praw
import pandas as pd
reddit = praw.Reddit(client_id='*******', \
client_secret='*******', \
user_agent='**********', \
username='********', \
password='*******8')
subreddit1 = reddit.subreddit("Tea")
subreddit2 = reddit.subreddit("Biophysics")
top_subreddit1 = subreddit1.top(limit=500)
top_subreddit2 = subreddit2.top(limit=500)
topics_dict = { "title":[],
"score":[],
"id":[], "url":[],
"comms_num": [],
"created": [],
"body":[]}
for submission1 in top_subreddit1:
topics_dict["title"].append(submission1.title)
topics_dict["score"].append(submission1.score)
topics_dict["id"].append(submission1.id)
topics_dict["url"].append(submission1.url)
topics_dict["comms_num"].append(submission1.num_comments)
topics_dict["created"].append(submission1.created)
topics_dict["body"].append(submission1.selftext)
for submission2 in top_subreddit2:
topics_dict["title"].append(submission2.title)
topics_dict["score"].append(submission2.score)
topics_dict["id"].append(submission2.id)
topics_dict["url"].append(submission2.url)
topics_dict["comms_num"].append(submission2.num_comments)
topics_dict["created"].append(submission2.created)
topics_dict["body"].append(submission2.selftext)
topics_data = pd.DataFrame(topics_dict)
topics_data
但是它仅显示在我的jupyter中。 现在,我想将进度保存为数据库文件。感谢所有建议。
答案 0 :(得分:0)
您有两种选择。我将介绍两个,各有优点和缺点:
1。 CSV
使用DataFrame.to_csv将文件保存到.csv:
topics_data.to_csv('path_to_file.csv')
然后,您可以继续在客户端应用程序中解析此csv文件,即,任何要使用您抓取的数据的应用程序。
优点
缺点
2。 SQLITE
您还可以选择使用DataFrame.to_sql将数据帧存储在sqlite中:
import sqlite3
db_file = 'my.db'
# This creates a new database file if it doesn't exist
db_conn = sqlite3.connect(db_file)
# This creates a new table 'topics_data' if it doesn't exist
topics_data.to_sql('topics_data', con=db_conn)
优点
缺点
在此处找到有关sqlite的更多信息:sqlite tutorial
答案 1 :(得分:0)
要将数据保存到本地供内部python使用,以后可以使用built in pickle
import pickle
def save_obj(obj, name ):
with open(f'{name}.pkl', 'wb') as f:
pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
def load_obj(name):
try:
with open(f'{name}.pkl', 'rb') as f:
return pickle.load(f)
print("")
print(f"loaded {name}")
print("")
except Exception as e:
print("")
print(f"Error loading object '{name}': {e}")
print("")