组合mean()和subset()

时间:2016-09-01 19:29:40

标签: r csv

我有一个非常具体的问题。我有一个csv表,我想根据这两个条件提取数据并得到它的mean()。我的这个代码是:

GDP <- mean(subset(World,World$Year==2013)$GDP_in_USD,na.rm=TRUE)

世界是我的csv表。在列表中,我有来自1960 - 2015年全球所有国家的不同列的数据。我希望从2013年开始拥有GDP_in_USD列的所有值(因此基本上每个国家/地区只有一个单元格)。

当我使用这个函数时,我得到的错误是值既不是数字也不是boolesh。奇怪的是,我的一个朋友给了我代码,它在他的电脑上工作。当我尝试重现它时,我得到了错误。要阅读csv表,我使用:

World <- read.csv("World2.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)

可能导致问题的原因是什么?如果您需要更多信息,请与我们联系。

    structure(list(Country.Year.Zeitraum_NR.Agriculture_value_added_percent_of_GDP.Central_government_debt_total_percent_of_GDP.Cost_to_export_USD_per_container.Cost_to_import_USD_per_container.Employment_in_agriculture_percent_of_total_employment.Employment_in_industry_percent_of_total_employment.Employment_in_services_percent_of_total_employment.Exports_of_goods_and_services_percent_of_GDP.Final_consumption_expenditure_etc_percent_of_GDP.Foreign_direct_investment_net_inflows_percent_of_GDP.Foreign_direct_investment_net_outflows_percent_of_GDP.General_government_final_consumption_expenditure_._of_GDP.GDP_growth_annual_percent.Government_expenditure_on_education_total_percent_of_GDP.Household_final_consumption_expenditure_etc_percent_of_GDP.Imports_of_goods_and_services_percent_of_GDP.Industry_value_added_percent_of_GDP.Inflation_consumer_prices_annual_percent.Lending_interest_rate_percent.Patent_applications_residents_._nonresidents.Research_and_development_expenditure_percent_of_GDP.Services_etc_value_added_percent_of_GDP.Subsidies_and_other_transfers_percent_of_expense.Tariff_rate_applied_simple_mean_all_products_percent.Taxes_on_exports_percent_of_tax_revenue.Taxes_on_goods_and_services_percent_of_revenue.Taxes_on_income_profits_and_capital_gains_percent_of_revenue.Taxes_on_international_trade_percent_of_revenue.Total_tax_rate_percent_of_commercial_profits.Trade_percent_of_GDP.Unemployment_total_percent_of_total_labor_force_national_estimate.GDP_in_USD = c("Afghanistan;1960;1;..;..;..;..;..;..;..;4.132233258;86.77685029;..;..;..;..;..;..;7.024793471;..;..;..;..;..;..;..;..;..;..;..;..;..;11.15702673;..;537777811.91", 
"Afghanistan;1961;1;..;..;..;..;..;..;..;4.453443322;87.0445247;..;..;..;..;..;..;8.097166426;..;..;..;..;..;..;..;..;..;..;..;..;..;12.55060975;..;548888894.58", 
"Afghanistan;1962;1;..;..;..;..;..;..;..;4.878051281;85.36583991;..;..;..;..;..;..;9.349593301;..;..;..;..;..;..;..;..;..;..;..;..;..;14.22764458;..;546666678.04", 
"Afghanistan;1963;1;..;..;..;..;..;..;..;9.171601205;93.49111965;..;..;..;..;..;..;16.86391035;..;..;..;..;..;..;..;..;..;..;..;..;..;26.03551156;..;751111190.76", 
"Afghanistan;1964;1;..;..;..;..;..;..;..;8.88889265;95.2777688;..;..;..;..;..;..;18.05555524;..;..;..;..;..;..;..;..;..;..;..;..;..;26.94444789;..;800000045.51", 
"Afghanistan;1965;1;..;..;..;..;..;..;..;11.25827903;98.89624551;..;..;..;..;..;..;21.41280357;..;..;..;..;..;..;..;..;..;..;..;..;..;32.6710826;..;1006666638.22"
)), .Names = "Country.Year.Zeitraum_NR.Agriculture_value_added_percent_of_GDP.Central_government_debt_total_percent_of_GDP.Cost_to_export_USD_per_container.Cost_to_import_USD_per_container.Employment_in_agriculture_percent_of_total_employment.Employment_in_industry_percent_of_total_employment.Employment_in_services_percent_of_total_employment.Exports_of_goods_and_services_percent_of_GDP.Final_consumption_expenditure_etc_percent_of_GDP.Foreign_direct_investment_net_inflows_percent_of_GDP.Foreign_direct_investment_net_outflows_percent_of_GDP.General_government_final_consumption_expenditure_._of_GDP.GDP_growth_annual_percent.Government_expenditure_on_education_total_percent_of_GDP.Household_final_consumption_expenditure_etc_percent_of_GDP.Imports_of_goods_and_services_percent_of_GDP.Industry_value_added_percent_of_GDP.Inflation_consumer_prices_annual_percent.Lending_interest_rate_percent.Patent_applications_residents_._nonresidents.Research_and_development_expenditure_percent_of_GDP.Services_etc_value_added_percent_of_GDP.Subsidies_and_other_transfers_percent_of_expense.Tariff_rate_applied_simple_mean_all_products_percent.Taxes_on_exports_percent_of_tax_revenue.Taxes_on_goods_and_services_percent_of_revenue.Taxes_on_income_profits_and_capital_gains_percent_of_revenue.Taxes_on_international_trade_percent_of_revenue.Total_tax_rate_percent_of_commercial_profits.Trade_percent_of_GDP.Unemployment_total_percent_of_total_labor_force_national_estimate.GDP_in_USD", row.names = c(NA, 
6L), class = "data.frame")

enter image description here

1 个答案:

答案 0 :(得分:0)

您的数据很乱,列名称以“。”分隔。并且数据以“;”分隔。如果上面的数据结构被称为“df”,这是一个可能的解决方案。

# your data from above
# World<-structure(list(Country.Year. ......

#get names and split
names<-strsplit(names(World), ".", fixed=TRUE)[[1]]
#37 names are created but only 35 columns of data exist
#removing the 2 most like errors
names[15]<-paste0(names[15], names[16])
names[24]<-paste0(names[24], names[25])
names<-names[-c(16,25)]

#now split the main body of the table
temp<-sapply(World, function(x){strsplit(x, ";", fixed=TRUE)})
newdf<-as.data.frame(matrix(unlist(temp), ncol=35, byrow = TRUE))
#rename the columns
names(newdf)<-names
#convert the strings to numbers
newdf[,2:35]<-apply(newdf[,2:35], 2, function(x){as.numeric(as.character(x))})

不是最优雅的代码,但是应该让你朝着正确的方向前进。