我是Python / R的初学者,并开始在工作场所中应用它。现在,我正在尝试解决一个小问题。
任务: 我必须下载一个.csv文件(用分号分隔),然后将该文件导入excel,对新近更新的站点进行排序(有一个列W,标题为Update)。从更新的站点中创建两个新的excel文件:一个用于欧洲站点,另一个用于非欧洲站点。
我最初的想法是这样的:
在更新列或更新部分中显示“新”时,我设法使其正常工作:
install.packages("openxlsx")
library("openxlsx")
European_countries <- c("Andorra","Austria","Belarus","Belgium","Bosnia and Herzegovina","Bulgaria","Croatia","Czech Republic","Denmark","Estonia","Finland","France","Germany","Greece","Hungary","Iceland","Ireland","Italy","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Norway","Poland","Portugal","Romania","Russia","San Marino","Serbia","Slovakia","Slovenia","Spain","Sweden","Switzerland","Ukraine","United Kingdom")
origin <- choose.files()
MyData <- read.csv(origin, sep = ";", header = TRUE,)
Update_sites <- subset(MyData, Update == "Updated")
EU_site <- Update_sites[Update_sites$Country %in% European_countries,]
'%ni%' <- Negate('%in%')
Not_EU_site <- Update_sites[Update_sites$Country %ni% European_countries,]
write.xlsx(EU_site, "C:/Users/WalzthE/Downloads/European_sites.xlsx")
write.xlsx(Not_EU_site, "C:/Users/WalzthE/Downloads/Not_european_sites")
但是,在以下情况下,我的问题来了:
当更新列的值不同于新值时,或者是。有时,它们充满了“更新的/手机/ sitemanager /本地”或“新的/管理器”或“更新的/传真”。我想仅通过具有内容来对单元进行子集化。 我浏览了各个论坛,发现类似:
z <- character(0)
subset(df, !(rownmaes(df) %in% z))
但这对我没有帮助...
我希望能够选择保存文件的位置,而不是保存到预定的文件夹中。这与第1点并不重要,只是给用户提供了更多选择。
csv文件中有特定的数据,例如“学习编号XYXY”和“ LOL-123”,这两个在我需要保存文件的末尾组成了文件名,如何我将这两个文件串联起来,使得最终文件名为:“ Study No.XYXY_LOL-123”
在此先感谢您的帮助!
答案 0 :(得分:0)
我将从编写没有choose.files
类型交互的脚本开始。像这样:
input_file <- "file.txt"
output_eu <- "eu.xlsx"
output_noteu <- "noteu.xlsx"
url <- "http://????"
download.file(url, "file.txt")
eu <- c("Andorra","Austria","Belarus","Belgium","Bosnia and Herzegovina","Bulgaria","Croatia","Czech Republic","Denmark","Estonia","Finland","France","Germany","Greece","Hungary","Iceland","Ireland","Italy","Latvia","Liechtenstein","Lithuania","Luxembourg","Malta","Moldova","Monaco","Montenegro","Netherlands","Norway","Poland","Portugal","Romania","Russia","San Marino","Serbia","Slovakia","Slovenia","Spain","Sweden","Switzerland","Ukraine","United Kingdom")
d <- read.table(input_file, sep = ";", header = TRUE)
# get all cases where there is some text in the Update field
updates <- d[d$Update != "", ]
i <- updates$Country %in% eu
eu_up <- update[i,]
noteu_up <- update[!i,]
library(writexl)
write_xlsx(eu_up, output_eu)
write_xlsx(noteu_up, output_noteu)
(同样,没有.csv file separated by semicolon
这样的东西; c
代表逗号)