让我们假设这个数据集:
answer <- c("a", "b", "b", NA, "a", "b", "a", "b", "a", NA, "a", "b")
weights <- c(0.1, 0.3, 0.2, 1.1, 0.3, 0.8, 0.9, 1.5, 0.9, 0.2, 0.15, 0.13)
year <- c(2001, 2005, 2010)
data <- cbind(answer,weights,year)
我想要一个时间序列图,其中显示可能答案(a
和b
)的加权频率。 NA
应该省略。
知道如何实现吗?
预先感谢!
如果我要重写我的问题,请告诉我。我是社区的新手...
答案 0 :(得分:0)
欢迎您!您可以使用using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication87
{
class Program
{
const string FILENAME = @"c:\temp\test.txt";
static void Main(string[] args)
{
Item items = new Item(FILENAME);
}
}
public class Item
{
public static List<Item> items = new List<Item>();
public string VehicleReferenceKey;
public string DriverReferenceKey;
public string Latitude;
public Item() { }
public Item(string filenam)
{
StreamReader reader = new StreamReader(filenam);
string line = "";
Item newItem = null;
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
string[] rowItems = line.Split(new char[] { ':' });
switch (rowItems[0])
{
case "VehicleReferenceKey" :
newItem = new Item();
items.Add(newItem);
newItem.VehicleReferenceKey = rowItems[1];
break;
case "DriverReferenceKey":
newItem.DriverReferenceKey = rowItems[1];
break;
case "Latitude":
newItem.Latitude = rowItems[1];
break;
}
}
}
}
}
}
对其进行整理以整理数据,并使用dplyr
对其进行管理,所有这些都与ggplot2
dplyr
(管道)包裹在magrittr
链中操作员。阅读所有这些软件包,它们非常有用。
%>%
PS
我使用library(dplyr)
library(ggplot2)
data %>%
# remove NAS
filter(!is.na(answer)) %>%
# group by
group_by(answer, year) %>%
# add a column made by the sums per year/answer: you can use other functions
summarise(weights = sum(weights)) %>%
# now the plot
ggplot(.,aes(x = sprintf("%.0f", year), # sprintf to remove decimal to years
y = weights,
colour = answer,
group = answer)) +
geom_line() + # add lines
labs( # rename x axis
x = "summed weights"
)
代替了data.frame
来存储数据,例如:
cbind