在我开始之前,我只是想说我对python的了解不是世界上最大的:)但是我尝试进行管理。简单介绍一下,我这里遇到的问题实际上是我1.5年前自定义构建的脚本,现在我想将其用于另一个(类似)目的。
用于从标签收集数据的脚本(因此所有标签都引用了),现在我“对其进行了重塑”以从位置收集热门帖子数据。
我遇到的问题是,虽然顶部的帖子数据已正确导出到csv,但我想添加帖子数,但这是行不通的。
我在Google上待了很长时间,并在各种行和可能性中插入了count参数,但似乎无法解决。
如果有人能帮助我解决这个问题,我将感到非常高兴。
因此,无需进一步说明,这是我的Python脚本的源代码:
import csv
import requests
def get_csv_header(top_numb):
fieldnames = ['LOCATION_ID','MEDIA_COUNT']
for col in range(top_numb):
fieldnames.append('TOP_{0}_LIKE'.format(col + 1))
fieldnames.append('TOP_{0}_COMMENT'.format(col + 1))
return fieldnames
def write_csv_header(filename, headers):
with open(filename, 'w', newline='') as f_out:
writer = csv.DictWriter(f_out, fieldnames=headers)
writer.writeheader()
return
def read_hash_tag(t_file):
with open(t_file) as f:
tags_list = f.read().splitlines()
return tags_list
if __name__ == '__main__':
# HERE YOU CAN SPECIFIC YOUR TAG FILE NAME,
# Which contains a list of hash tags, BY DEFAULT <current working directory>/tags.txt
TAGS_FILE = 'ids.txt'
# HERE YOU CAN SPECIFIC YOUR DATA FILE NAME, BY DEFAULT (data.csv)', Where your final result stays
DATA_FILE = 'data.csv'
MAX_POST = 9 # MAX POST
""" Start scraping inst for like and comment based on hash tags """
print('Job starts, please wait until it finishes.....')
explore_url = 'https://www.instagram.com/explore/locations/'
tags = read_hash_tag(TAGS_FILE)
""" Writing data to csv file """
csv_headers = get_csv_header(MAX_POST)
write_csv_header(DATA_FILE, csv_headers)
for tag in tags:
post_info = {'LOCATION_ID': tag}
url = explore_url + tag + '/'
params = {'__a': 1}
try:
response = requests.get(url, params=params).json()
except ValueError:
print('ValueError for location id {0}...Skipping...'.format(tag))
continue
media_count = response['graphql']['location']['edge_location_to_media']['count']
top_posts = response['graphql']['location']['edge_location_to_top_posts']['edges']
for num, post in enumerate(top_posts):
if num + 1 <= MAX_POST:
post_info['TOP_{0}_LIKE'.format(num + 1)] = post['node']['edge_liked_by']['count']
post_info['TOP_{0}_COMMENT'.format(num + 1)] = post['node']['edge_media_to_comment']['count']
else:
break
with open('data.csv', 'a', newline='') as data_out:
print('Writing Data for location id {0}.....'.format(tag))
print(media_count)
csv_writer = csv.DictWriter(data_out, fieldnames=csv_headers)
csv_writer.writerow(post_info)
""" Done with the script """
print('ALL DONE !!!! ')
这是CMD中的输出:
C:\ Users \ Administrator \ Downloads> locations.py作业开始,请等待 直到完成..... 正在为位置ID 213226563 .....写入数据 1346216 正在写入位置ID为919359 .....的数据 56752 全部完成!!!!
值1346216和56752是这些位置的媒体计数,实际上是正确的。
但是每当我打开正在构建的csv时,我都看不到其中存储的值:
LOCATION_ID,MEDIA_COUNT,TOP_1_LIKE,TOP_1_COMMENT,TOP_2_LIKE,TOP_2_COMMENT,TOP_3_LIKE,TOP_3_COMMENT,TOP_4_LIKE,TOP_4_COMMENT,TOP_5_LIKE,TOP_5_COMMENT,TOP_6_LIKE,TOP_6_COMMENT,TOP_7_LIKE,TOP_7_COMMENT,TOP_8_LIKE,TOP_8_COMMENT,TOP_9_LIKE,TOP_9_COMMENT
213226563,,551,21,288,51,796,27,346,7,329,44,8641,181,507,32,1513,31,432,12
919359,,456,1,265,7,771,0,815,9,79,2,107,5,116,1,95,1,153,3
我知道这不是最优雅/最性感的解决方案。但是,让我的媒体计数进入第2列的csv文件中的任何帮助都是伟大的!
答案 0 :(得分:1)
您永远不要将media_count添加到post_info中,后者是被写入csv的字典。
在写部分中进行此更改应该可以解决您的问题:
with open('data.csv', 'a', newline='') as data_out:
print('Writing Data for location id {0}.....'.format(tag))
print(media_count)
post_info["MEDIA_COUNT"] = media_count
csv_writer = csv.DictWriter(data_out, fieldnames=csv_headers)
csv_writer.writerow(post_info)