我遇到了一个问题,我得到了' k'刮掉Instagram粉丝的数量而不是实际的数字时的缩写。
import requests, os, time, sys
from bs4 import BeautifulSoup
import pandas as pd
def insta_info(account_name):
html = requests.get('https://www.instagram.com/%s/'%(account_name))
soup = BeautifulSoup(html.text, 'lxml')
data = soup.find_all('meta', attrs={'property':'og:description'})
text = data[0].get('content').split()
user = '%s %s %s' % (text[-3], text[-2], text[-1])
followers = text[0]
following = text[2]
lst = []
lst.append(followers)
lst.append(following)
return lst
kellz = insta_info(kellz_ocho)
print(kellz)
返回:
[14.2k, 608]
当我希望它返回时:
[14241, 608]
有没有办法让这种情况发生?我没有写上面的代码,而是我在网上发现并实现了它。因此,我并不确切如何运作。请注意,我想要抓取的帐户是公开的。
非常感谢!
答案 0 :(得分:0)
您提供的代码绝对不是正确的方法。请不要使用它。
从这个链接可以看出:https://www.instagram.com/developer/endpoints/users/获取用户信息非常简单。如果您不想编写要进行身份验证的代码,您甚至可以从此处获取访问令牌:http://instagram.pixelunion.net/。
答案 1 :(得分:0)
为了获得你想要的东西,你需要将selenium与BeautifulSoup结合使用,因为在页面源中你没有在meta
标签中找到这样的东西;相反,唯一可用的东西就是你已经拥有的东西。试试这个:
from bs4 import BeautifulSoup ; from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.instagram.com/kellz_ocho/")
soup = BeautifulSoup(driver.page_source,"html.parser")
driver.quit()
for title in soup.select("._h9luf"):
posts = title.select("._fd86t")[0].text
follower = title.select("._fd86t")[1]['title']
following = title.select("._fd86t")[2].text
print("Posts: {}\nFollower: {}\nFollowing: {}".format(posts,follower,following))
结果:
Posts: 59
Follower: 14,253
Following: 608
顺便说一下,关注者状态已经发生了变化。
答案 2 :(得分:0)
这应该有效。基本上,附加代码检查'k'并将剩余部分乘以1000,如果有'k'
import requests, os, time, sys
from bs4 import BeautifulSoup
import pandas as pd
def insta_info(account_name):
html = requests.get('https://www.instagram.com/%s/'%(account_name))
soup = BeautifulSoup(html.text, 'lxml')
data = soup.find_all('meta', attrs={'property':'og:description'})
text = data[0].get('content').split()
user = '%s %s %s' % (text[-3], text[-2], text[-1])
followers = text[0]
if followers[-1] == 'K':
followers = int(float(followers[:-1].encode('UTF-8')) * 1000)
else:
followers = int(float(followers.encode('UTF-8')))
following = text[2]
if following[-1] == 'K':
following = int(float(following[:-1].encode('UTF-8')) * 1000)
else:
following = int(float(following.encode('UTF-8')))
lst = []
lst.append(followers)
lst.append(following)
return lst
kellz = insta_info(kellz_ocho)
print(kellz)