去年11月,我问了一个关于从两个不同数据框(here)对数据进行子集化的问题。我想从data1中选择与data2中所需行具有相同纬度的行。现在我的情况不尽相同,但也有类似的问题。
我的数据文件具有以下结构:
数据文件
date time stat_id lat lon tempc
20121122 0 1 0.407353E+02 -0.165700E+00 0.798737E+01
20121122 0 2 0.406287E+02 -0.113300E+00 0.649903E+01
20121122 0 3 0.406621E+02 -0.209800E+00 0.772955E+01
20121122 0 4 0.403943E+02 -0.126100E+00 0.798837E+01
20121122 0 5 0.404532E+02 0.604000E-01 0.103548E+02
放置文件
Zona Poble stat_id lat lon alt
1 Zorita 1 0.407353E+02 -0.165700E+00 691.867004
1 Morella 2 0.406287E+02 -0.113300E+00 955.718994
1 Forcall 3 0.406621E+02 -0.209800E+00 753.882019
2 Benasal 4 0.403943E+02 -0.126100E+00 848.171021
2 Cati 5 0.404532E+02 0.604000E-01 667.609985
他们都共享stat_id字段。我想根据$ Zona的位置对数据文件进行子集化。例如,如果我希望Zona为1,那么将提取三个位置,stat_id = 1,2,3。对于子集,我使用此命令:
zona1=subset(data,data$stat_id == places$stat_id[places$Zona == 1])
这成功地选择了三个站的数据文件中的行,但不是所有stat_id为1,2或3的行。数据文件包含每小时数据但是zona1只显示时间0,3,6,9,12,15的数据,18,21,24小时。发出命令时,我收到此警告:
Mensajes de aviso perdidos
In data$stat_id == places$stat_id[places$Zona == 1] :
larger object length is not a multiple of the smaller one
(请原谅我对警告信息的翻译)
zona1输出
"","date","time","stat_id","lat","lon","tempc"
"1",20121122,0,1,40.7353,-0.1657,7.98737
"2",20121122,0,2,40.6287,-0.1133,6.49903
"3",20121122,0,3,40.6621,-0.2098,7.72955
"385",20121122,30000,1,40.7353,-0.1657,7.00632
"386",20121122,30000,2,40.6287,-0.1133,4.83684
"387",20121122,30000,3,40.6621,-0.2098,6.42246
"769",20121122,60000,1,40.7353,-0.1657,6.55283
"770",20121122,60000,2,40.6287,-0.1133,4.85467
"771",20121122,60000,3,40.6621,-0.2098,5.90663
"1153",20121122,90000,1,40.7353,-0.1657,6.35216
"1154",20121122,90000,2,40.6287,-0.1133,5.66342
"1155",20121122,90000,3,40.6621,-0.2098,6.15894
这是我正在尝试的脚本:
datos=read.table("data.dat",header=T)
pobles=read.table("pobles-zona.dat",header=T)
data=as.data.frame(datos)
places=as.data.frame(pobles)
zona1=subset(data,data$stat_id == places$stat_id[places$Zona == 1])
和
提供的数据文件data.dat http://ubuntuone.com/0pDaVxaBQZWZSAVr2b3n6v
pobles-zona.dat http://ubuntuone.com/753L9uFbntRc46Ah5gIZdp
我应该遗漏一些东西,任何帮助都会受到赞赏。
提前致谢
答案 0 :(得分:0)
Hong Ooi的评论已经解决了这个问题。它只是在%
中替换==%严格的是
zona1=subset(data,data$stat_id %in% places$stat_id[places$Zona == 1])
感谢洪和卡尔