我现在创建一个数据库,其中将包含许多非结构化数据。我的数据库通过excel表获取数据,但是excel表包含一些我不想在数据库中包含的空白行/行(EE77,KK12)。到目前为止,该程序将在空白行开始的地方停止(EE77),但是我想要来自FF888888和GG121的数据
这是我的代码:
from src.server.connectToDB import get_sql_conn
import pandas as pd
if __name__ == '__main__':
cursor = get_sql_conn().cursor()
excelFile = pd.ExcelFile('C:\\Users\\dw\\Source\Repos\\analyse\\data\\Test-nordpool.xlsx')
a = ["A1", "A2"]
for i in a:
df = excelFile.parse(i)
for key, rows in df.items():
print("# Kolonne: ", "\n")
columnInsertSql = "INSERT INTO DataSets (Hour, BlockBuyNet, BlockSell, RejectedBlockBuy, RejectedBlockSell, NetImports) VALUES("
rowCounter = 1
for key, column in rows.items():
columnInsertSql += str(column)
if rowCounter != len(list(rows.items())):
columnInsertSql += ", "
rowCounter += 1
columnInsertSql += ")"
cursor.execute(columnInsertSql)
print("SQL: " + columnInsertSql)
cursor.commit()
结果
AA8 BB88 CC888 D88888 EE77 FF888888 KK12 GG121
9 99 999 9999 - 999999 - 1212
10 100 10000 100000 - 1000000 - 121212
11 111 11111 111111 - 1111111 - 1212121
12 122 12222 12222 - 1222222 - 12121212
13 133 13333 13333 - 1333333 - 121212121
14 144 14444 1444444 - 1444444 - 121212121
答案 0 :(得分:0)
我认为也许您可以使用df.shape()来查找行数,然后使用df.isnull()。sum()来为每列提供空值数量。然后,您可以将两者进行比较,找出哪些列仅包含空值,并在处理中跳过它们。下面不是最优雅的,但是我相信它会带给您大致的想法。
import pandas as pd
import numpy as np
raw_data = { 'hi': [20, 19, 22, 21],
'bye': [88, 92, 95, 70],
'why': [np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(raw_data, columns = [ 'hi','bye','why'])
NumberOfRows = df.shape[0]
print(NumberOfRows)
for c in df.isnull().sum():
if c == NumberOfRows:
print('Do Something')
else:
print('Do Something Else')