我想使用python或r将https://bitinfocharts.com上某条推文的体积图表中的数据剪贴到某种数据文件中。我是python的新手,不知道该怎么做。我在论坛上看过其他问题,但我做不到
我感兴趣的图表如下:https://bitinfocharts.com/comparison/decred-tweets.html#1y
我正在寻找一个数据表,其中以每个日期和当天的相应推文数为列。
非常感谢您的帮助。
答案 0 :(得分:0)
可能有更优雅的解决方案,但数据嵌入在脚本标签中。只需将其取出并解析为表即可:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
def parse_strlist(sl):
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only
url = 'https://bitinfocharts.com/comparison/decred-tweets.html#1y'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if 'd = new Dygraph(document.getElementById("container")' in script.text:
StrList = script.text
StrList = '[[' + StrList.split('[[')[-1]
StrList = StrList.split(']]')[0] +']]'
StrList = StrList.replace("new Date(", '').replace(')','')
dataList = parse_strlist(StrList)
date = []
tweet = []
for each in dataList:
if (dataList.index(each) % 2) == 0:
date.append(each)
else:
tweet.append(each)
df = pd.DataFrame(list(zip(date, tweet)), columns=["Date","Decred - Tweets"])
输出:
print (df)
Date Decred - Tweets
0 2018/01/08 69
1 2018/01/09 200
2 2018/01/10 163
3 2018/01/11 210
4 2018/01/12 256
5 2018/01/13 185
6 2018/01/14 147
7 2018/01/15 119
8 2018/01/16 169
9 2018/01/17 176
10 2018/01/18 209
11 2018/01/19 179
12 2018/01/20 274
13 2018/01/21 124
14 2018/01/22 185
15 2018/01/23 110
16 2018/01/24 109
17 2018/01/25 86
18 2018/01/26 49
19 2018/01/27 null
20 2018/01/28 null
21 2018/01/29 null
22 2018/01/30 null
23 2018/01/31 194
24 2018/02/01 197
25 2018/02/02 163
26 2018/02/03 73
27 2018/02/04 98
28 2018/02/05 210
29 2018/02/06 215
.. ... ...
680 2019/11/19 58
681 2019/11/20 67
682 2019/11/21 72
683 2019/11/22 79
684 2019/11/23 46
685 2019/11/24 38
686 2019/11/25 81
687 2019/11/26 57
688 2019/11/27 54
689 2019/11/28 60
690 2019/11/29 55
691 2019/11/30 40
692 2019/12/01 39
693 2019/12/02 71
694 2019/12/03 93
695 2019/12/04 44
696 2019/12/05 41
697 2019/12/06 34
698 2019/12/07 40
699 2019/12/08 44
700 2019/12/09 47
701 2019/12/10 47
702 2019/12/11 64
703 2019/12/12 61
704 2019/12/13 67
705 2019/12/14 93
706 2019/12/15 59
707 2019/12/16 86
708 2019/12/17 82
709 2019/12/18 51
[710 rows x 2 columns]