如何在脚本标记内解析json var

时间:2017-12-05 06:13:19

标签: javascript python json web-scraping beautifulsoup

我试图抓取https://www.proteinatlas.org/ENSG00000167286-CD3D/pathology/tissue/renal+cancer

中散点图中显示的数据

javascript在

'<script> var plot = $('#scatter6001').scatterPlot({"Alive (n=651)":{"symbol":"circle","data":[{"x":0.407889650408,"y":12.811,"tooltip":"TCGA-KL-8324-01A<br>Female\/ Stage ii \/ Alive<br>FPKM: 0.4<br>Living days: 4676 (12.8 years)","class":"stage_ii sex_f best_low median_low"},{"x":0.587835812523,"y":8.0795,"tooltip":"TCGA-KL-8334-01A<br>Female\/ Stage iii \/ Alive<br>FPKM: 0.6<br>Living days: 2949 (8.1years)","class":"stage_iii sex_f best_low median_low"}'...});

我的问题是如何解析TCGA-XX-XXXX-XXX中的信息,性别,阶段,生活或死亡,FPKM和生活日?以及如何在csv文件中保存这些信息?

这是我所做的代码。

page = urlopen("https://www.proteinatlas.org/ENSG00000167286-CD3D/pathology/tissue/prostate+cancer#imid_3605750")
content = page.read()
soup = BeautifulSoup(content,'lxml')

table = soup.find('div', {'id':'scatter6001'})
print(table)

p = re.search(r"var plot = (.*?);",soup).group(1)
print(p)

代码有一些错误,即

  

回溯(最近一次呼叫最后一次):文件&#34; scrap2.py&#34;,第24行,in          p = re.search(r&#34; var plot =(。*?);&#34;,soup).group(1)File&#34; C:\ Python34 \ lib \ re.py&#34;,第170行,在搜索中       return _compile(pattern,flags).search(string)TypeError:期望的字符串或缓冲区

如何解决此问题并废弃我想要的数据?

由于

1 个答案:

答案 0 :(得分:0)

遵守它绝对不是一个好主意,但它会获取您所追踪的数据。

TCGA-KK-A7B3-01A Male  Stage not reported  Alive FPKM 5.5 Living days 899 (2.5 years)
TCGA-G9-6347-01A Male  Stage not reported  Alive FPKM 14.2 Living days 2089 (5.7 years)
TCGA-KC-A4BL-01A Male  Stage not reported  Alive FPKM 3.8 Living days 934 (2.6 years)
TCGA-KK-A7AQ-01A Male  Stage not reported  Alive FPKM 2.6 Living days 1610 (4.4 years)

部分输出:

$('.issue-carousel').on({
    // On mousemove, controls follow cursor
    mousemove: function(e) {
        var parentOffset = $(this).offset();
        var relX = e.pageX - parentOffset.left;
        var relY = e.pageY - parentOffset.top;
        $('.drag-indicator').css({
            left: relX,
            top: relY
        });
    },
    mousedown: function() {
       $('.drag-indicator').fadeOut(300);
    }
});