我现在正在使用Rstudio作为网络抓取工具。但我有一个问题。
#define NUMSTU 50
#include <stdio.h>
//function prototype
void printdata();
//Global variables
int stuID[NUMSTU];
int stuCount;
int totStu;
int main ()
{
int stuCount = 0;
int totStu = 0;
int studentID;
//Prompt user for number of student's in class
printf("Please enter number of student's in class:");
scanf ("%d", &totStu);
for (stuCount = 0; stuCount <totStu; stuCount++)
{
//Prompt user for student ID number
printf("\n Please enter student's ID number:");
scanf("%d", &studentID);
stuID[NUMSTU] = studentID;
}
//Call Function to print data
printdata();
return 0;
}//end main
void printdata(){
//This function will display collected data
//Input: Globals stuID[NUMSTU]
//Output: none
//Display column headers
printf("\n\n stuID\n");
//loop and display student ID numbers
for (stuCount = 0; stuCount <totStu; stuCount++){
printf("%d", stuID);
}
}
A [+]总是返回false,我不知道为什么。我将其与其他使用完全相同的方法返回true的其他人进行了比较。有谁知道如何解决这个问题?
答案 0 :(得分:2)
网页使用的是UTF-8编码,这似乎导致了这个问题。
library(rvest)
page_html <- read_html("http://competitie.vttl.be/index.php?menu=6&sel=36665&result=1&category=1")
grade <- page_html %>% html_nodes("td:nth-child(1) :nth-child(2) :nth-child(3) .DBTable_first") %>% html_text()
grade
[1] "A [+]"
Encoding(grade)
[1] "UTF-8"
Encoding(grade) <- "unknown"
grade
[1] "AÂ [+]"
注意额外的角色!
一个解决方案是
grade <- page_html %>% html_nodes("td:nth-child(1) :nth-child(2) :nth-child(3) .DBTable_first") %>% html_text()
grade <- iconv(grade, "UTF-8", "ASCII", "")
identical(grade,"A[+]")
[1] TRUE
NB从UTF-8转换为ASCII会删除空格,因此现在比较为“A [+]”
BTW我必须调整html_nodes
中的css选择器字符串才能使其正常工作。