对LSTM基于时间的数据准备的误解

时间:2017-07-11 00:50:58

标签: python python-3.x csv machine-learning tensorflow

我正在尝试复制Chevalier的LSTM Human Activity Recognition算法,当我意识到我的方法与算法的方法不匹配时遇到了问题。作为此question的后续内容,我可以通过此方法为load_X生成结果:

在[0]:

def load_X(X_signals_paths):
    X_signals = []
    for signal_type_path in X_signals_paths:
        with open(signal_type_path, 'r') as csvfile:
            reader = csv.reader(csvfile)
            next(reader)
            for serie in [row[1:2] for row in reader]:
            #X_signals.append([np.array([row[1:2] for row in reader],dtype=np.float32) for row in reader])
                X_signals.append(np.array(serie, dtype=np.int32))
            file.close()
    return (np.transpose(np.transpose(X_signals), (1, 0)))

X_train_signals_paths = [
    DATASET_PATH + TRAIN + signal + "_train.csv" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
    DATASET_PATH + TEST + signal + "_test.csv" for signal in INPUT_SIGNAL_TYPES
]

X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
print(X_train)

输出[0]:

[[ 6]
 [ 6]
 ..., 
 [13]
 [13]
 [13]] 

然而,我更多地查看了Chevalier的方法,当我len(X_train[0])len(X_train[0][0])时,我发现了一些有趣的东西。似乎我格式化x值的方式与Chevalier的x值非常不同。我的原始CSV文件可以找到here,并且可以找到Chevalier的X_train的原始txt文件here。以下是Chevalier的代码,用于与我的比较:

def load_X(X_signals_paths):
    X_signals = []

    for signal_type_path in X_signals_paths:
        file = open(signal_type_path, 'r')
        # Read dataset from disk, dealing with text files' syntax
        X_signals.append(
            [np.array(serie, dtype=np.float32) for serie in [
                row.replace('  ', ' ').strip().split(' ') for row in file
            ]]
        )
        file.close()

    return np.transpose(np.array(X_signals), (1, 2, 0))

X_train_signals_paths = [
    DATASET_PATH + TRAIN + "Inertial Signals/" + signal + "train.txt" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
    DATASET_PATH + TEST + "Inertial Signals/" + signal + "test.txt" for signal in INPUT_SIGNAL_TYPES
]

X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)

以下内容来自Chevalier"附加参数"部分是我混淆的主要原因:

training_data_count = len(X_train)  # 7352 training series (with 50% overlap between each serie)
test_data_count = len(X_test)  # 2947 testing series
n_steps = len(X_train[0])  # 128 timesteps per series
n_input = len(X_train[0][0])  # 9 input parameters per timestep

我观察到的是,这50%的重叠意味着单独评估的时间间隔重叠,如0-64,32-96,64-128,96等。我知道的一个事实是7352是X_train.txt中的行数。 [0][0][0]表示它分别选择X_train数组的第0列和X_train的第0列和第0行。我的代码目前正在做的是单独转换每个数据点。这就是为什么当我评估len(X_train[0])时,我收到1并且len(X_train[0][0])我收到了错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-255-14523e544e49> in <module>()
      2 test_data_count = len(list(X_test))
      3 n_steps = len(X_train[0])
----> 4 n_input = len(list(X_train)[0][0])
      5 print(training_data_count, test_data_count, n_steps, n_input)

TypeError: object of type 'numpy.int32' has no len()

我想知道如何重新格式化我的数据以匹配txt文件中Chevalier的预期格式? &#34;附加参数&#34;中的数字是什么? Chevalier's git的部分是什么意思,我如何根据我现在的模特量身定制呢?

0 个答案:

没有答案