我有一个包含公司名称和年份的大型数据集:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>consul</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/consul</string>
<string>agent</string>
<string>-config-dir</string>
<string>/etc/consul.d/client</string>
</array>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>StandardOutPath</key>
<string>/var/log/consul.log</string>
<key>StandardErrorPath</key>
<string>/var/log/consul_err.log</string>
</dict>
</plist>
我需要编写一个函数,在给定年数n和m的情况下,它会为我提供一个公司列表,这些公司的相应连续年份值从第n年开始,到第m年结束。
例如,在上述情况中,f(2001,2002)将显示:
2001 company 1
2002 company 1
2003 company 1
2004 company 1
2001 company 2
2002 company 2
2001 company 3
2003 company 3
2004 company 3
它也可以只提供公司名称。 f(2001年,2003年)只显示公司1和2,因为公司3跳过了2002年。
答案 0 :(得分:1)
试试这个:
year1 = value of year1 (start year)
year2 = value of year2 (end year)
df = the data frame which has these values
companies_func <- function(year1, year2, df)
{
return (df[(df$year >= year1) & (df$year <= year2)])
}
print(companies_func(2001, 2002, df))
year company
1: 2001 company1
2: 2002 company1
3: 2001 company2
4: 2002 company2
5: 2001 company3
答案 1 :(得分:1)
您还可以将一些dplyr
函数包装到函数中以获得所需的结果。
library(dplyr)
company_func <- function(data = data, year_1, year_2){
#filter dataset to years of interest
data <- data %>% filter(Year >= year_1 & Year <= year_2)
#sort by company and year
data <- data %>% arrange(Company, Year)
#calc difference in years for each company
data <- data %>% group_by(Company)
%>% mutate("year_diff" = Year - lag(Year, default = min(Year)))
#filter to only comp with consecutive years
data.filter <- data %>% filter(year_diff == 1)
data <- data %>% filter(Company %in% data.filter$Company) %>%
select(Company, Year)
return(data)
}
结果:
company_func(data, 2001, 2002)
Company Year
1 company 1 2001
2 company 1 2002
3 company 2 2001
4 company 2 2002
company_func(data, 2001, 2003)
Company Year
1 company 1 2001
2 company 1 2002
3 company 1 2003
4 company 2 2001
5 company 2 2002
答案 2 :(得分:0)
以下是data.table
的解决方案:
library("data.table")
dt <- fread(
"year company
2001 company1
2002 company1
2003 company1
2004 company1
2001 company2
2002 company2
2001 company3
2003 company3
2004 company3")
years <- 2001:2002
dt[, if (all(years %in% year)) company, company][,1]
# dt[, if (all(years %in% year)) company, company][, company] # if you want a vector of char
这将为您提供具有完整年份序列的公司的名称:
# > dt[, if (all(years %in% year)) company, company][,1]
# company
# 1: company1
# 2: company2
如果要定义功能,可以执行以下操作:
f <- function(DT, from, to) {
years <- from:to
DT[, if (all(years %in% year)) company, company][,1]
}
f(dt, 2001, 2002)
答案 3 :(得分:0)
我会使用data.table包而不是函数
sunrise.php
编辑:
我误解了你的问题。如果你想要一系列年份,我会这样做:
years = c(2001, 2002) #vector with your years
dt <- as.data.table(df) #convert the table to a data.table
dt[year %in% years]