我正在尝试收集瑞典的市政选举数据,并需要计算市政局中的政党人数。由于存在本地政党,因此所有主要政党的信息都显示在单独的单元格中,而较小的本地政党则在单独的列中显示。
当我刮擦桌子并只清除我需要的信息时,变量就是一个因素,这是我之前遇到的,通常只是转换为字符。
但是,当我在此处执行此操作时,它会破坏我想要保留的信息。
它没有显示Borås“ kommun”的“VÄG= 3”,而是显示“ c(ÖVR= 9)”并删除了我需要的信息,而我想作为NA的观察结果变成了“ c(ÖVR= 1)”。
我还尝试了sub(),试图在尝试转换为字符之前用NA替换空白的观察值,但是随后所有内容都变成了NA。
虽然最小的可重现样本是模拟数据的最佳选择,但我不考虑一种在不包含来源的情况下进行重现的方法,但是如果有人知道该方法,请告诉我以后的问题!
library(rvest) #For Web scraping
library(tidyverse) #For mainly pipes and filter function
#Official Swedish Election data
url <-"https://data.val.se/val/val2006/slutlig_ovrigt/statistik/kommun/mandat_kommun_parti.html"
elections <- read_html(url) %>%
html_table(header = TRUE, fill = TRUE)
elections <- elections[[1]]
# This is three different municipalities, one with one local party,
# one with no local party, and one with two local parties
elections <- elections %>% filter(Kommun %in% c("Borås", "Eskilstuna", "Huddinge"))
elections <- t(elections) #transpose so each municipality is a variable, and the parties are observations
elections <- elections[-nrow(elections),] #delete the total number of seats
elections <- elections[-1,] #Remove the municipalities names
elections <- data.frame(elections) #convert into a data frame
row.names(elections) <- c() #remove the row names
others <- elections[nrow(elections),] #take the other parties
others <- as.character(others) #here everything goes wrong
对我来说,预期结果是将其转换为显示的信息,但将其转换为字符而不是因子水平,并且空的观察值将变为NA或我可以转换为NA的东西,但相反它将变为... “ c(ÖVR= X)”格式。
对于在哪里可以找到有关解决方法的信息的任何帮助或指导,将不胜感激!对于如何改善我的问题提出的任何批评也是如此!
谢谢。
答案 0 :(得分:0)
您目前的方法有一些不建议执行的步骤。考虑以下替代方案,该方案可使数据保持整洁:
library(tidyr)
library(dplyr)
library(rvest)
url <-"https://data.val.se/val/val2006/slutlig_ovrigt/statistik/kommun/mandat_kommun_parti.html" #Official Swedish Election data
page <- read_html(url)
page %>%
html_table(header = TRUE, fill = TRUE) %>%
first() %>%
filter(Kommun %in% c("Borås", "Eskilstuna", "Huddinge")) %>%
select(Kommun, x = ÖVR) %>% # renamed ÖVR as encoding was producing weird results with separate_rows()
separate_rows(x, sep = ", ") %>%
na_if("") %>%
group_by(Kommun) %>%
summarise(Count = sum(!is.na(x)))
# A tibble: 3 x 2
Kommun Count
<chr> <int>
1 Borås 1
2 Eskilstuna 0
3 Huddinge 2
原始方法的主要问题是t()
使数据成为字符矩阵,然后data.frame()
默认将字符串转换为因数,随后您尝试将其转换为数据中的字符。框架对象而不是每个变量。因此,您可以这样做:
elections <- t(elections) #transpose so each municipality is a variable, and the parties are observations
elections <- elections[-nrow(elections),] #delete the total number of seats
elections <- elections[-1,] #Remove the municipalities names
elections <- data.frame(elections, stringsAsFactors = FALSE, row.names = NULL) #convert into a data frame
others <- elections[nrow(elections),] #take the other parties