Question

我有一个巨大的csv文件，其中包含有关正在行驶的自行车的数据。所以我得到了以秒为单位的时间列和速度列。我想检查数据中的特定模式，以便得出在旅途中发生的情况。

例如，驶向交通信号灯的方向：

到目前为止，我已经知道了：

import UIKit

class ViewController: UIViewController, UITableViewDelegate, UITableViewDataSource {
    let fruit = ["Apple", "Prune", "Grapes", "Watermelon", "Melon", "Cherry"]

    @IBOutlet weak var tableView: UITableView!

    override func viewDidLoad() {
        super.viewDidLoad()

        tableView.delegate = self
        tableView.dataSource = self
    }

    func tableView(_ tableView: UITableView, numberOfRowsInSection section: Int) -> Int {
        return fruit.count
    }

    func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell {
        let cell = tableView.dequeueReusableCell(withIdentifier: "customCell") as! FruitTableViewCell

        cell.fruitLable.text = fruit[indexPath.row]
        cell.fruitImage.image = UIImage(named: fruit[indexPath.row])

        return cell
    }
}

import UIKit

class FruitTableViewCell: UITableViewCell {
    @IBOutlet weak var fruitView: UIView!
    @IBOutlet weak var fruitImage: UIImageView!
    @IBOutlet weak var fruitLable: UILabel!

    override func awakeFromNib() {
        super.awakeFromNib()
    }

    override func setSelected(_ selected: Bool, animated: Bool) {
        super.setSelected(selected, animated: animated)
    }
}

我想要这样的东西：

import pandas as pd

df = pd.read_csv('.csv', usecols = ['time', 'speed']) 
df['accelerating'] = df['speed'].diff() > 0

预期输出：

  df_traffic_light = df.loc[df['speed'] < 15 & accelerating == False #driving torwards the traffic light; 
    & df.loc[df['speed']< 1 #getting really slow or Standing still;
    & df.loc[df['speed']  > 5 & accelerating == True #for light switched to green and starting again

我用dataframe.rolling尝试了一下，但是效果不佳。有什么想法可以解决这个问题吗？

Answer 1

这个答案可能并不令人满意，但是当您描述问题时，这是不可能解决的。只要您不知道某个（子）序列实际上代表什么的信息，就不可能将数据划分为带有标签的类，例如“驶向交通信号灯”等。

除了手动调整逻辑功能，可视化数据并据此推理外，我从机器学习领域看到了两个选择：

监督学习：您至少需要一些标记的数据，算法可以从这些数据中了解某些类的特征。即您需要通过实验（让10个不同的人各自生成您想识别的事件/类的10倍）来知道它们代表的某些事件，或者通过手动评估（子）序列根据您的专业知识对数据进行可视化和判断。 sklearn程序包包含一些或多或少的简单方法来预测仍然未知的数据的类别，即decision trees或support vector machines
无监督学习：您只需搜索特征性子序列，而无需先验就知道它们代表什么。 sklearn.cluster为此提供了一些算法。然后，您和您的专业知识也将取决于群集的特征并为其赋予标签。

两种方法都需要付出一定的努力才能融入其中。但我希望它能为您指明正确的方向。如果您在研究过程中可能遇到更具体的问题而找不到答案，请随时提出。

在数据框中搜索模式

1 个答案: