R中的多重回归:在data.frame中找不到变量

时间:2014-02-15 19:53:36

标签: r regression

这是我的data.frame :: beef

> head(beef)
   YEAR....PBE  CBE  PPO  CPO  PFO DINC  CFO RDINC RFP
1 1925    59.7 58.6 60.5 65.8 65.8 51.4 90.9  68.5 877
2 1926    59.7 59.4 63.3 63.3 68.0 52.6 92.1  69.6 899
3   1927    63 53.7 59.9 66.8 65.5 52.1 90.9  70.2 883
4   1928    71 48.1 56.3 69.9 64.8 52.7 90.9  71.9 884
5   1929    71 49.0 55.0 68.7 65.6 55.1 91.1  75.2 895
6 1930    74.2 48.2 59.6 66.1 62.4 48.8 90.7  68.3 874

dput(head(beef))
structure(list(YEAR....PBE = structure(1:6, .Label = c("1925    59.7", 
"1926    59.7", "1927    63", "1928    71", "1929    71", "1930    74.2", 
"1931    72.1", "1932    79", "1933    73.1", "1934    70.2", 
"1935    82.2", "1936    68.4", "1937    73", "1938    70.2", 
"1939    67.8", "1940    63.4", "1941    56"), class = "factor"), 
    CBE = c(58.6, 59.4, 53.7, 48.1, 49, 48.2), PPO = c(60.5, 
    63.3, 59.9, 56.3, 55, 59.6), CPO = c(65.8, 63.3, 66.8, 69.9, 
    68.7, 66.1), PFO = c(65.8, 68, 65.5, 64.8, 65.6, 62.4), DINC = c(51.4, 
    52.6, 52.1, 52.7, 55.1, 48.8), CFO = c(90.9, 92.1, 90.9, 
    90.9, 91.1, 90.7), RDINC = c(68.5, 69.6, 70.2, 71.9, 75.2, 
    68.3), RFP = c(877L, 899L, 883L, 884L, 895L, 874L)), .Names = c("YEAR....PBE", 
"CBE", "PPO", "CPO", "PFO", "DINC", "CFO", "RDINC", "RFP"), row.names = c(NA, 
6L), class = "data.frame")

我想根据其他变量为PBE创建一个多元线性回归模型。按照本link中的教程,我认为我应该执行以下代码:

> lm(formula = PBE ~ CBE + PBO + CPO + PFO + 
+        DINC + CFO+RDINC+RFP+YEAR, data = beef)

eval(expr,envir,enclos)中的错误:找不到对象'PBE' 所以我决定尝试以下但是都有一些错误:

> lm(formula=PBE~YEAR,data=beef)
Error in eval(expr, envir, enclos) : object 'PBE' not found
> lm(formula=beef$PBE~beef$YEAR)
Error in model.frame.default(formula = beef$PBE ~ beef$YEAR, drop.unused.levels = TRUE) : 
  invalid type (NULL) for variable 'beef$PBE

你能否告诉我一些关于错字/错误所在的洞察力?

P.S。:我使用beef=read.table("beef.txt", header = TRUE, sep = "\t", comment.char="%")读取文件,文件如下所示:

% http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html
% 
% Datafile Name: Agricultural Economics Studies
% Datafile Subjects: Agriculture , Economics , Consumer
% Story Names: Agricultural Economics Studies
% Reference: F.B. Waugh, Graphic Analysis in Agricultural Economics,
%   Agricultural Handbook No. 128, U.S. Department of Agriculture, 1957.
% Authorization: free use
% Description: Price and consumption per capita of beef and pork
%   annually from 1925 to 1941 together with other variables relevant to
%   an economic analysis of price and/or consumption of beef and pork
%   over the period.
% Number of cases: 17
% Variable Names:
% 
%   PBE = Price of beef (cents/lb)
%   CBE = Consumption of beef per capita (lbs)
%   PPO = Price of pork (cents/lb)
%   CPO = Consumption of pork per capita (lbs)
%   PFO = Retail food price index (1947-1949 = 100)
%   DINC = Disposable income per capita index (1947-1949 = 100)
%   CFO = Food consumption per capita index (1947-1949 = 100)
%   RDINC = Index of real disposable income per capita (1947-1949 = 100)
%   RFP = Retail food price index adjusted by the CPI (1947-1949 = 100)
% 
% The Data:
YEAR    PBE CBE PPO CPO PFO DINC    CFO RDINC   RFP
1925    59.7    58.6    60.5    65.8    65.8    51.4    90.9    68.5    877
1926    59.7    59.4    63.3    63.3    68  52.6    92.1    69.6    899
1927    63  53.7    59.9    66.8    65.5    52.1    90.9    70.2    883
1928    71  48.1    56.3    69.9    64.8    52.7    90.9    71.9    884
1929    71  49  55  68.7    65.6    55.1    91.1    75.2    895
1930    74.2    48.2    59.6    66.1    62.4    48.8    90.7    68.3    874
1931    72.1    47.9    57  67.4    51.4    41.5    90  64  791

以下是Patrick建议View(beef)的结果: enter image description here

1 个答案:

答案 0 :(得分:5)

您需要返回并查看将这些数据加载到R中的文件。 head()的输出表明第一个变量为YEAR....PBEPBE数据已与YEAR变量合并,可能是由于使用的分隔符存在某些问题在您读入的文件中。返回并仔细检查文件。

从R中执行此操作的一种方法是使用count.fields(),您可以通过文件名进行检查。请阅读?count.fields,因为您可能需要设置sepquote参数,以匹配您从中读取数据的文件。该函数将告诉您它找到了多少个字段(变量);将其与已知的变量数进行比较。

从您的编辑中可以清楚地看到,我上面描述的内容已经发生了:

> names(beef)
[1] "YEAR....PBE" "CBE"         "PPO"         "CPO"         "PFO"        
[6] "DINC"        "CFO"         "RDINC"       "RFP"

该文件似乎并非全部/完全/真正以制表符分隔。我能够阅读你所包含的数据:

beef <- read.table("file.name", header = TRUE, sep = "", comment.char = "%")

> head(beef)
  YEAR  PBE  CBE  PPO  CPO  PFO DINC  CFO RDINC RFP
1 1925 59.7 58.6 60.5 65.8 65.8 51.4 90.9  68.5 877
2 1926 59.7 59.4 63.3 63.3 68.0 52.6 92.1  69.6 899
3 1927 63.0 53.7 59.9 66.8 65.5 52.1 90.9  70.2 883
4 1928 71.0 48.1 56.3 69.9 64.8 52.7 90.9  71.9 884
5 1929 71.0 49.0 55.0 68.7 65.6 55.1 91.1  75.2 895
6 1930 74.2 48.2 59.6 66.1 62.4 48.8 90.7  68.3 874
> str(beef)
'data.frame':   7 obs. of  10 variables:
 $ YEAR : int  1925 1926 1927 1928 1929 1930 1931
     $ PBE  : num  59.7 59.7 63 71 71 74.2 72.1
 $ CBE  : num  58.6 59.4 53.7 48.1 49 48.2 47.9
     $ PPO  : num  60.5 63.3 59.9 56.3 55 59.6 57
 $ CPO  : num  65.8 63.3 66.8 69.9 68.7 66.1 67.4
     $ PFO  : num  65.8 68 65.5 64.8 65.6 62.4 51.4
 $ DINC : num  51.4 52.6 52.1 52.7 55.1 48.8 41.5
     $ CFO  : num  90.9 92.1 90.9 90.9 91.1 90.7 90
 $ RDINC: num  68.5 69.6 70.2 71.9 75.2 68.3 64
     $ RFP  : int  877 899 883 884 895 874 791