提供另一列中具有连续值的名称列表

时间:2018-03-01 15:15:27

标签: r

我有一个包含公司名称和年份的大型数据集:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
        <dict>
                <key>Label</key>
                <string>consul</string>
                <key>ProgramArguments</key>
                <array>
                        <string>/usr/local/bin/consul</string>
                        <string>agent</string>
                        <string>-config-dir</string>
                        <string>/etc/consul.d/client</string>
                </array>
                <key>RunAtLoad</key><true/>
                <key>KeepAlive</key><true/>

                <key>StandardOutPath</key>
                <string>/var/log/consul.log</string>
                <key>StandardErrorPath</key>
                <string>/var/log/consul_err.log</string>

        </dict>
</plist>

我需要编写一个函数,在给定年数n和m的情况下,它会为我提供一个公司列表,这些公司的相应连续年份值从第n年开始,到第m年结束。

例如,在上述情况中,f​​(2001,2002)将显示:

2001  company 1
2002  company 1
2003  company 1
2004  company 1
2001  company 2
2002  company 2
2001  company 3
2003  company 3
2004  company 3

它也可以只提供公司名称。 f(2001年,2003年)只显示公司1和2,因为公司3跳过了2002年。

4 个答案:

答案 0 :(得分:1)

试试这个:

year1 = value of year1 (start year)
year2 = value of year2 (end year)
df = the data frame which has these values

companies_func <- function(year1, year2, df)
{
    return (df[(df$year >= year1) & (df$year <= year2)])
}

print(companies_func(2001, 2002, df))

   year  company
1: 2001 company1
2: 2002 company1
3: 2001 company2
4: 2002 company2
5: 2001 company3

答案 1 :(得分:1)

您还可以将一些dplyr函数包装到函数中以获得所需的结果。

library(dplyr)

company_func <- function(data = data, year_1, year_2){
  #filter dataset to years of interest
  data <- data %>% filter(Year >= year_1 & Year <= year_2)
  #sort by company and year
  data <- data %>% arrange(Company, Year)
  #calc difference in years for each company
  data <- data %>% group_by(Company)
  %>% mutate("year_diff" = Year - lag(Year, default = min(Year)))
  #filter to only comp with consecutive years
  data.filter <- data %>% filter(year_diff == 1)
  data <- data %>% filter(Company %in% data.filter$Company) %>% 
    select(Company, Year)
  return(data)
}

结果:

company_func(data, 2001, 2002)

     Company Year
1  company 1 2001
2  company 1 2002
3  company 2 2001
4  company 2 2002

company_func(data, 2001, 2003)

     Company Year
1  company 1 2001
2  company 1 2002
3  company 1 2003
4  company 2 2001
5  company 2 2002

答案 2 :(得分:0)

以下是data.table的解决方案:

library("data.table")

dt <- fread(
"year company
2001  company1
2002  company1
2003  company1
2004  company1
2001  company2
2002  company2
2001  company3
2003  company3
2004  company3")

years <- 2001:2002
dt[, if (all(years %in% year)) company, company][,1]
# dt[, if (all(years %in% year)) company, company][, company] # if you want a vector of char

这将为您提供具有完整年份序列的公司的名称:

# > dt[, if (all(years %in% year)) company, company][,1]
#     company
# 1: company1
# 2: company2

如果要定义功能,可以执行以下操作:

f <- function(DT, from, to) {
  years <- from:to
  DT[, if (all(years %in% year)) company, company][,1]
}

f(dt, 2001, 2002)

答案 3 :(得分:0)

我会使用data.table包而不是函数

sunrise.php

编辑:

我误解了你的问题。如果你想要一系列年份,我会这样做:

years = c(2001, 2002) #vector with your years

dt <- as.data.table(df) #convert the table to a data.table

dt[year %in% years]