Question

我正在收集faculty的所有考试日期，以跟踪更改等。

我的代码：

template<class T, size_t Rows, size_t Cols>
class CMatrix<T, Rows, Cols>
{
private:
   size_t n_rows = Rows, n_cols = Cols;
   T data[Rows][Cols];

public:
   CMatrix() {}
   CMatrix(const CMatrix& other);
   CMatrix& operator=(const CMatrix& rhs);
   CMatrix exp() const;
};

（可能只循环一次，但这不应该成为问题）

在某种程度上它可以正常工作，但是随后它没有检测到关闭并且最后一个表项看起来像这样：

template<class T, int...> class CMatrix;

这显然是最后一个表项，因为Beautiful Soup无法以某种方式检测到，并且以下html代码被放入此处。

此条目的html代码：

from bs4 import BeautifulSoup
import requests
import csv


data = requests.get('https://www.wiwi.kit.edu/pruefungstermine.php')

soup = BeautifulSoup(data.text, 'lxml')


table = soup.find('tbody').find_all('tr') #finds table with relevant information and returns a list with all entries (is working)

first_row = ('Prüfung', 'Prüfer', 'Datum', 'Zeit/Ort') #header (in German but doesn't matter)

exams = []

for row in table: #looping through every tr
    content = row.find_all('td')
    exam_name = content[0].find('a').text.strip()
    lecturer = content[1].text.strip()
    date = content[2].text.strip()
    time_location = content[3].text.replace('\n', ', ').strip()

    exam = (exam_name, lecturer, date, time_location)
    exams.append(exam)


with open('exams.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerow(first_row)
    for row in exams:
        writer.writerow(row)

在输入此项之前，谁能说出它为什么起作用？

预先感谢

Answer 1

我希望这是由于Neue Chemie周围的样式标签格式错误造成的：

<style="color:#ff0000;">Neue Chemie</style="color:#ff0000;">

这是无效的html。删除样式标签可能会为您带来想要的结果。如果可行，您可以尝试保留style标记，但使其成为格式正确的标记，而在结束标记中不包含任何其他信息，应该总是读</style>

查看源代码后，它确实是格式错误的HTML：

这里您有一个关闭但没有打开的跨度。相反，您有一个空缺。

基于文件的其余部分，看起来您想要的是具有样式属性的开头跨度，例如： <span style="something;">text</span>

其中许多需要纠正。您可以通过搜索/替换来做到这一点：

搜索：<style="color:#ff0000

替换：<span style="color:#ff0000

美丽的汤没有检测到td标签的结尾

1 个答案: