我有一个数据库,可以在其中添加每次迭代的新数据,并尝试将它们合并到datetime列中。
我正在使用这部分代码:
# Iterate by days
for row in rows:
i += 1;
df_name = f"{row[0]}_{row[1]}";
print(f"Getting {df_name} {i}/{len(rows)}{spaces}", end="\r");
if (predictionPoint == row[0]):
currentDf = pd.read_sql(f"SELECT updated_at, c as '{df_name}_c', "
f"v as '{df_name}_v', o FROM commons "
f"WHERE cid LIKE '{predictionMeasure}%' AND s = '{row[0]}' AND cid = '{row[1]}' "
"ORDER BY updated_at DESC", con = sqlite);
else:
currentDf = pd.read_sql(f"SELECT updated_at, c as '{df_name}_c', "
f"v as '{df_name}_v' FROM commons "
f"WHERE cid LIKE '{predictionMeasure}%' AND s = '{row[0]}' AND cid = '{row[1]}' "
"ORDER BY updated_at DESC", con = sqlite);
currentDf["updated_at"] = currentDf["updated_at"].apply(convertDatetime);
if (df.empty == False):
df = pd.merge(left = df, right = currentDf, on = "updated_at", how = "inner");
else:
df = currentDf;
if not os.path.exists(f"{dirName}/{datasetFilename}"):
df.to_csv(f"{dirName}/{datasetFilename}", encoding = "utf-8", index = False);
else:
tempDf = pd.read_csv(f"{dirName}/{datasetFilename}", parse_dates = ["updated_at"]);
df = pd.concat([tempDf, df], axis = 0, sort = False);
df.to_csv(f"{dirName}/{datasetFilename}", encoding = "utf-8", index = False);
print(f"Dataset created {a}/{len(archives)}{spaces}");
df = pd.read_csv(f"{dirName}/{datasetFilename}", parse_dates = ["updated_at"]);
df = df.set_index("updated_at", drop = False);
print("Sorting, filling N/A, cleaning...");
df = df.sort_index(ascending = False);
df = df.fillna(method = "ffill").fillna(method = "bfill");
我在此代码的27个字符串处遇到错误,它返回了一个未合并的数组,该数组具有重复的update_at列或削减的数据帧,但我希望:
updated_at one two three four
2019-06-02 23:59:45 1 2 3 4
2019-06-02 23:59:30 2 3 4 5
2019-06-02 23:59:15 3 4 5 6
2019-06-02 23:59:00 4 5 6 7
2019-06-02 23:58:45 5 6 7 8
由于通过合并添加的数据,因此没有重复的updated_at,并且没有间隙。 我已经尝试使用连接和其他类型的合并...