从交互式图表中收集数据

时间:2019-05-22 09:13:39

标签: javascript svg web-scraping charts screen-scraping

我正在尝试检索正在此页面上的javascript中生成图表的数据:

https://www.energy-charts.de/price.htm

我找到了SVG元素,它们用它们的ML元素画线,但是我不知道从哪里开始寻找保存实际数据的javascript数组。 (我假设某处必须有一个数组)。

对于要在哪里开始寻找这些数据的提示和提示,我深表感谢。

1 个答案:

答案 0 :(得分:2)

简短回答

数据存储在https://www.energy-charts.de/price/week_2019_21.json的JSON文件中


长期回答

如果打开开发人员工具(F12),则可以从文件#include<initializer_list> #include<iostream> #include<vector> // Declarations: // Stores a collection of numbers in the form radix*RADIX_MULTIPLIER + suffix, // where 0 <= suffix < RADIX_MULTIPLIER. class Block { public: using SUFFIX_VECTOR_T = std::vector<unsigned short>; constexpr static unsigned int RADIX_MULTIPLIER = 100; private: const unsigned short radix; // Contains all the suffixes associated with this radix. SUFFIX_VECTOR_T suffixes; public: Block(unsigned short radix); unsigned short getRadix() const; void pushSuffix(const unsigned short suffix); std::size_t size() const; unsigned int operator[](std::size_t idx) const; SUFFIX_VECTOR_T::const_iterator begin() const; SUFFIX_VECTOR_T::const_iterator cbegin() const; SUFFIX_VECTOR_T::const_iterator end() const; SUFFIX_VECTOR_T::const_iterator cend() const; }; using DATA_VECTOR_T = std::vector<Block>; class MyIterator : public std::iterator<std::input_iterator_tag, unsigned int> { const DATA_VECTOR_T& data; DATA_VECTOR_T::const_iterator block_it; Block::SUFFIX_VECTOR_T::const_iterator suffix_it; public: MyIterator( const DATA_VECTOR_T& data, const DATA_VECTOR_T::const_iterator start_block_it); MyIterator& operator++(); MyIterator operator++(int); bool operator==(const MyIterator& rhs) const; bool operator!=(const MyIterator& rhs) const; unsigned int operator*() const; }; // Read-only container which stores a sorted collection of numbers // memory-efficiently in Blocks. class MyContainer { public: using const_iterator = MyIterator; private: DATA_VECTOR_T data; public: // The initializer list must be sorted in non-descending order. MyContainer(std::initializer_list<unsigned int> il); unsigned int operator[](std::size_t idx) const; MyContainer::const_iterator begin() const; MyContainer::const_iterator cbegin() const; MyContainer::const_iterator end() const; MyContainer::const_iterator cend() const; }; // Definitions: // class Block Block::Block(unsigned short radix): radix(radix) {} unsigned short Block::getRadix() const { return radix; } void Block::pushSuffix(const unsigned short suffix) { suffixes.push_back(suffix); } std::size_t Block::size() const { return suffixes.size(); } unsigned int Block::operator[](std::size_t idx) const { return (unsigned int)(radix)*RADIX_MULTIPLIER + (unsigned int)(suffixes[idx]); } Block::SUFFIX_VECTOR_T::const_iterator Block::begin() const { return suffixes.begin(); } Block::SUFFIX_VECTOR_T::const_iterator Block::cbegin() const { return begin(); } Block::SUFFIX_VECTOR_T::const_iterator Block::end() const { return suffixes.end(); } Block::SUFFIX_VECTOR_T::const_iterator Block::cend() const { return end(); } // class MyContainer // The initializer list must be sorted in non-descending order. MyContainer::MyContainer(std::initializer_list<unsigned int> il) { if (il.size() == 0) { return; } unsigned short radix = *il.begin() / Block::RADIX_MULTIPLIER; data.push_back(Block(radix)); for (const auto x : il) { radix = x / Block::RADIX_MULTIPLIER; if (data.back().getRadix() != radix) { data.push_back(Block(radix)); } unsigned short suffix = x % Block::RADIX_MULTIPLIER; data.back().pushSuffix(suffix); } } unsigned int MyContainer::operator[](std::size_t idx) const { auto data_it = data.begin(); // Similarly to std::vector::operator[], if idx is out of bounds of the // container, the behavior is undefined. while (idx >= data_it->size()) { idx -= data_it->size(); ++data_it; } return (*data_it)[idx]; } MyContainer::const_iterator MyContainer::begin() const { return MyIterator(data, data.cbegin()); } MyContainer::const_iterator MyContainer::cbegin() const { return begin(); } MyContainer::const_iterator MyContainer::end() const { return MyIterator(data, data.end()); } MyContainer::const_iterator MyContainer::cend() const { return end(); } // class MyIterator MyIterator::MyIterator( const DATA_VECTOR_T& data, const DATA_VECTOR_T::const_iterator start_block_it): data(data), block_it(start_block_it) { if (data.cend() != block_it) { suffix_it = block_it->cbegin(); } } MyIterator& MyIterator::operator++() { if (data.cend() == block_it) { return *this; } ++suffix_it; if (block_it->cend() == suffix_it) { ++block_it; if (data.cend() != block_it) { suffix_it = block_it->cbegin(); } } return *this; } MyIterator MyIterator::operator++(int) { MyIterator tmp = *this; operator++(); return tmp; } bool MyIterator::operator==(const MyIterator& rhs) const { // If this iterator has reached the end: if (data.cend() == block_it) { // Only return true if both iterators point to the end of the same // object. return data.cend() == rhs.block_it; } return block_it == rhs.block_it && suffix_it == rhs.suffix_it; } bool MyIterator::operator!=(const MyIterator& rhs) const { return !(*this == rhs); } unsigned int MyIterator::operator*() const { const std::size_t idx = suffix_it - block_it->cbegin(); return (*block_it)[idx]; } // Entry point: int main() { std::vector<unsigned int> v = { 1234, 1254, 1264, 1265, 1267, 1268, 1271, 1819, 1832, 1856, 1867, 1892, 3210, 3214, 3256, 3289 }; // Print the entire vector. for (const auto x: v) { std::cout << x << "\t"; } std::cout << std::endl; // Print a randomly accessed element from the vector. std::cout << v[10] << std::endl; MyContainer c = { 1234, 1254, 1264, 1265, 1267, 1268, 1271, 1819, 1832, 1856, 1867, 1892, 3210, 3214, 3256, 3289 }; // Print the entire container. for (const auto x: c) { std::cout << x << "\t"; } std::cout << std::endl; // Print a randomly accessed element from the container. std::cout << c[10] << std::endl; return 0; } 中看到console.log个负载。其中大多数没有用,但price.js行似乎对我们有用。

打开got chartTitle from JSON! _chartTitle: Electricity production and spot prices in Germany in week 21 2019并搜索“ got chartTitle”,我发现了一个名为price.js的函数,该函数似乎正在加载JSON文件。我假设这些将从某种API返回,而不是直接存储在JS文件中。

从“ got chartTitle”向上滚动,我注意到这一行: createChart。对我来说,这是从文件路径加载JSON。在搜索“文件路径”时,我发现它已声明为全局变量。将其输入到javascript控制台中,我看到值是“ ./price/week_2019_21.json”,因此导航至该URL(https://www.energy-charts.de/price/week_2019_21.json)应该是您要查找的数据!

此网址是通过以下代码块计算的:

d3.json(filepath, function(error, json) {

默认值在if(defaultweek < 10){ filepath = "./price/week_" + defaultyear +"_0"+ defaultweek +".json"; //default file on first-load } else{ filepath = "./price/week_" + defaultyear +"_"+ defaultweek +".json"; //default file on first-load } 中设置。

希望这会有所帮助!