如何使用两列来访问数据框中的特定元素?

时间:2017-09-14 21:29:43

标签: r dataframe

我尝试使用两列来访问表,然后将其输出到第三列。这是我写的用来访问它的函数:

getami <- function(bedroom, year){
  ami <- hhsplus[bedroom + 1, year - 1997]
  return(ami)
}

这就是我调用函数的方式

df$ami <- getami(df$beds, df$year)
床和年份只是两个整数的列表

这里是hhsplus看起来像的摘录:

    1998    1999    2000    2001    2002
-------------------------------------------
1   54050   57800   60900   61100   67200
2   61750   66100   69600   69850   76800
3   69500   74350   78300   78550   86400
4   77200   82600   87000   87300   96000
5   83400   89200   93950   94300   103700
6   89550   95800   100900  101250  111350
7   95750   102400  107900  108250  119050
8   101900  109050  114850  115250  126700

当我将它存储到df $ ami时,它按降序显示。我想知道如何根据两列

来存储ami

修改:This is what df$beds and df$year (actually df$dc) looks like

编辑2:这里是CSV格式的df摘录:

"","Date Listed","Price Listed","Date Closed","Price Closed","Days on Market","Age","Price/SF","SF","Beds","Baths","dc","ami"
"1",2013-05-30,1538000,2013-08-08,1480000,18,0,332,4460,7,6,2013,NA
"2",2014-05-15,2799000,2014-10-08,2300000,124,3,265,8691,7,8,2014,NA
"3",2014-03-14,1199888,2014-09-19,1200000,145,9,215,5586,7,6,2014,NA
"4",2016-03-28,3195000,2016-10-07,2800000,112,14,427,6562,7,6,2016,NA
"5",2010-05-25,2350000,2011-04-01,1925000,245,33,241,8000,6,12,2011,NA
"6",2013-11-15,2295000,2014-12-19,2183000,285,8,299,7300,6,8,2014,NA
"7",2015-05-05,1550000,2015-08-04,1550000,57,11,310,4993,6,6,2015,NA
"8",2014-02-21,2595000,2014-04-23,2520000,37,11,329,7651,6,7,2014,NA
"9",2013-08-12,3750000,2015-07-15,2640000,548,12,376,7030,6,5,2015,NA
"10",2009-09-16,2750000,2009-12-10,2525000,527,9,334,7550,6,6,2009,NA
"11",2013-05-27,1299000,2014-02-07,1350000,201,21,320,4217,6,5,2014,NA
"12",2015-02-07,2299000,2015-06-23,2240000,10,28,288,7783,6,8,2015,NA
"13",2014-05-16,1760000,2015-06-02,1700000,311,28,256,6650,6,5,2015,NA
"14",2012-02-24,749950,2012-04-27,740000,29,32,183,4045,6,3,2012,NA
"15",2013-01-25,1650000,2013-03-25,1600000,11,28,511,3133,6,6,2013,NA
"16",2014-02-16,1198000,2014-04-16,1150000,11,36,388,2964,6,5,2014,NA
"17",2014-04-04,1349950,2014-08-11,1340000,59,36,273,4904,6,4,2014,NA
"18",2017-06-04,1425000,2017-06-05,1425000,1,40,450,3166,6,4,2017,NA
"19",2009-05-08,1850000,2009-12-01,1500000,188,32,250,6000,6,4,2009,NA
"20",2014-03-14,1650000,2015-03-17,1480000,335,37,318,4660,6,4,2015,NA
"21",2013-06-12,2348000,2013-10-24,2025000,300,11,397,5100,6,5,2013,NA
"22",2016-01-25,1249000,2016-02-29,1125000,14,44,403,2792,6,4,2016,NA
"23",2011-08-22,580000,2011-11-08,575000,241,40,158,3636,6,5,2011,NA
"24",2011-07-25,599000,2011-09-14,570000,4,52,221,2576,6,4,2011,NA
"25",2010-06-26,1349000,2010-09-30,1300000,56,72,260,5000,6,4,2010,NA
"26",2016-09-09,1399000,2016-11-16,1410000,4,12,357,3948,6,5,2016,NA

编辑3:dput(head(df,10))

structure(list(`Date Listed` = structure(c(1369872000, 1400112000, 
1394755200, 1459123200, 1274745600, 1384473600, 1430784000, 1392940800, 
1376265600, 1253059200), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `Price Listed` = c(1538000, 2799000, 1199888, 3195000, 2350000, 
    2295000, 1550000, 2595000, 3750000, 2750000), `Date Closed` = structure(c(1375920000, 
    1412726400, 1411084800, 1475798400, 1301616000, 1418947200, 
    1438646400, 1398211200, 1436918400, 1260403200), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), `Price Closed` = c(1480000, 2300000, 
    1200000, 2800000, 1925000, 2183000, 1550000, 2520000, 2640000, 
    2525000), `Days on Market` = c(18, 124, 145, 112, 245, 285, 
    57, 37, 548, 527), Age = c(0, 3, 9, 14, 33, 8, 11, 11, 12, 
    9), `Price/SF` = c(332, 265, 215, 427, 241, 299, 310, 329, 
    376, 334), SF = c(4460, 8691, 5586, 6562, 8000, 7300, 4993, 
    7651, 7030, 7550), Beds = c(7, 7, 7, 7, 6, 6, 6, 6, 6, 6), 
    Baths = c(6, 8, 6, 6, 12, 8, 6, 7, 5, 6), dc = c(2013, 2014, 
    2014, 2016, 2011, 2014, 2015, 2014, 2015, 2009), ami = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Date Listed", 
"Price Listed", "Date Closed", "Price Closed", "Days on Market", 
"Age", "Price/SF", "SF", "Beds", "Baths", "dc", "ami"), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

2 个答案:

答案 0 :(得分:0)

如果您只是想将数据转换为平面文件,可以使用gather包中的tidyr

library(tidyr)
df = read.table(text=" bedroom   1998    1999    2000    2001    2002
                1   54050   57800   60900   61100   67200
                2   61750   66100   69600   69850   76800
                3   69500   74350   78300   78550   86400
                4   77200   82600   87000   87300   96000
                5   83400   89200   93950   94300   103700
                6   89550   95800   100900  101250  111350
                7   95750   102400  107900  108250  119050
                8   101900  109050  114850  115250  126700", header = TRUE)
answer = gather(data = df, key = "year", value = "hhsplus", X1998:X2002) 

请注意,我从您的示例数据创建数据集的方式,所有年份列现在都在前面有“X”。以下是解决问题的方法:

answer$year = as.numeric(gsub("X", "", answer$year))

结果:

    bedroom year hhsplus
    1       1998   54050
    2       1998   61750
    3       1998   69500
    4       1998   77200
    5       1998   83400
    6       1998   89550
    7       1998   95750
    8       1998  101900
    1       1999   57800
    ...

答案 1 :(得分:0)

我会通过合并两个数据帧来解决这个问题。您可以将hhsplus转换为长格式来完成此操作。请参阅下面的代码。

但是,我不清楚你想要如何合并两个数据帧。在你的函数中,你有hhsplus[bedroom + 1, year - 1997],为什么你在卧室加1,从1997年减去1997?

require("tidyr")

# From lebelinoz's answer, read in hhsplus:
hhsplus = read.table(text=" bedroom   1998    1999    2000    2001    2002
                     1   54050   57800   60900   61100   67200
                     2   61750   66100   69600   69850   76800
                     3   69500   74350   78300   78550   86400
                     4   77200   82600   87000   87300   96000
                     5   83400   89200   93950   94300   103700
                     6   89550   95800   100900  101250  111350
                     7   95750   102400  107900  108250  119050
                     8   101900  109050  114850  115250  126700", header = TRUE)

# convert hhsplus to long format:
ncols = ncol(hhsplus)
hhsplus_long = gather(data = hhsplus, year, hhsplus_ami, -1)
hhsplus_long$year = gsub("X", "", hhsplus_long$year)
hhsplus_long$bedroom = hhsplus_long$bedroom - 1

# merge two data frames, keeping all records from df (all.x=TRUE)
merge(df, hhsplus_long, by.x = c("Beds", "dc"), by.y=c("bedroom", "year"), all.x=TRUE)