我正在使用熊猫读取Excel文件。我想从原始数据帧创建多个数据帧。每个数据框名称应为第1行标题。另外,如何跳过每笔交易之间的一列。
预期结果:
SELECT
a.nip,
SUM(c.gaji_pokok + c.uang_makan + c.tunjangan + c.kendaraan + c.overtime + c.komisi + c.lain_lain + c.cuti -
m.pot_absen_hari * m.pot_absen_rate - IFNULL(g.pot_absen_hari * g.pot_absen_rate, 0) - CONCAT((c.uang_makan)/0.25)*0.05 -
n.pot_komisi_dl - n.pot_komisi_p312 - n.pot_komisi_mteg - IFNULL(g.pot_komisi_kasbon, 0) - q.bpjs4 - o.pot_ppn_21pt - o.pot_pinjaman - o.pot_ppn21 - o.pot_bayar_bonus -
o.pot_bayar_thr - c.cuti) as bulan_ppn21,
IFNULL((
CASE
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=50000000)
THEN (0.05*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=250000000)
THEN (0.15*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.15)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=500000000)
THEN (0.25*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.25)-(q.jht*12))) end),0) as tahun_pph21,
IFNULL((
CASE
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=50000000)
THEN (0.05*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=250000000)
THEN (0.15*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.15)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=500000000)
THEN (0.25*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.25)-(q.jht*12))) end) -(
CASE
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=50000000)
THEN (0.05*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=250000000)
THEN (0.15*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.15)-(q.jht*12)))
WHEN ((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)<=500000000)
THEN (0.25*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.25)-(q.jht*12))) end)/12,0) - (
CASE
WHEN (((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))<=50000000)
THEN (0.05*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate)))
WHEN (((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))<=250000000)
THEN (0.15*((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))-5000000)
WHEN (((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))<=500000000)
THEN (0.25*(0.03*(c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))-55000000)*1.2
WHEN (((c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))<=500000000)
THEN (0.25*(0.03*(c.gaji_pokok * 12)-((c.gaji_pokok * 12)*0.05)-(q.jht*12)-(r.pt_kp_rate))-55000000)*1.2 end) as tot_pph21
FROM `t_pegawai` a
LEFT JOIN t_penggajian_karyawan c ON c.nip=a.nip
LEFT JOIN t_departemen d ON d.id_departemen=a.id_departemen
LEFT JOIN t_jabatan e ON e.id_jabatan=a.id_jabatan
LEFT JOIN t_perusahaan f ON f.kode_unitbisnis = a.unit_bisnis
LEFT JOIN absensi k ON k.pin = a.pin
LEFT JOIN t_periode l ON l.nama_periode=c.bulan and YEAR(l.periode_start) = c.tahun
LEFT JOIN t_potongan_absen m ON m.nip=a.nip and m.nip=c.nip and m.bulan = l.id_periode and m.tahun = YEAR(l.periode_start)
LEFT JOIN t_potongan_gaji g ON g.nip=a.nip and g.nip=c.nip and g.bulan = l.id_periode and g.tahun = YEAR(l.periode_start)
LEFT JOIN t_potongan_komisi n ON n.nip=a.nip and n.nip=c.nip and n.bulan = l.id_periode and n.tahun = YEAR(l.periode_start)
LEFT JOIN t_potongan_ppn o ON o.nip=a.nip and o.nip=c.nip and o.bulan = l.id_periode and o.tahun = YEAR(l.periode_start)
LEFT JOIN t_jenjang_bpjs q ON q.nip=a.nip and q.tahun = YEAR(l.periode_start)
LEFT JOIN t_ptkp r ON r.pt_kp_name=a.status_ptkp
WHERE l.id_periode='8' AND f.kode_unitbisnis ='PJS-001' and k.Tanggal >= l.periode_start and k.Tanggal <= l.periode_end
GROUP BY a.pin
我尝试过的事情:
transaction_1:
name id available capacity completed all
transaction_2:
name id available capacity completed all
transaction_3:
name id available capacity completed all
答案 0 :(得分:1)
您可以尝试以下操作(与pd.__version__ == 1.1.1
一起使用):
df = (pd.read_excel(
"capacity.xlsx", sheet_name="Sprint Details", header=[0, 1], index_col=[0, 1]
)
.dropna(axis=1, how="all")
.rename_axis(index=["name", "id"], columns=[None, None]))
transaction_1 = df["transaction_1"].reset_index()
transaction_2 = df["transaction_2"].reset_index()
transaction_3 = df["transaction_3"].reset_index()
从本质上讲,我们需要将表读取为具有MultiIndex的数据框。前两行是我们的列名header=[0,1]
。前2列是我们用于每个“子表” index_col=[0,1]
的索引。
由于每个表中都有空格,因此我们将拥有完全为NaN
的列,因此我们将其与.dropna(axis=1, how="all")
删除。
由于pandas不希望索引名和列在同一行中,因此应错误地将索引列名["name", "id"]
解析为列index
的第二级名称。为了解决这个问题,我们可以手动分配正确的索引名称,同时也可以通过rename_axis(index=["name", "id"], columns=[None, None])
现在我们有了一个格式良好的表,其中包含一个MultiIndex
列,我们可以简单地对每个表进行切片,并对每个表调用.reset_index()
,以确保每个表都具有"name"
和"id"
作为每个表中的一列。
编辑:似乎我们在熊猫版本之间存在解析差异。
选项1。
如果您可以直接修改excel工作表以包含另一行(以更好地将列与索引名称分开)。这将提供最可靠的结果。
以下代码有效:
df = (pd.read_excel(
"capacity.xlsx", sheet_name="Sprint Details", header=[0, 1], index_col=[0, 1]
)
.dropna(axis=1, how="all"))
transaction_1 = df["transaction_1"].reset_index()
transaction_2 = df["transaction_2"].reset_index()
transaction_3 = df["transaction_3"].reset_index()
选项2
如果您无法修改excel文件,那么很遗憾,我们将需要一个更复杂的方法。
df = pd.read_excel("capacity.xlsx", header=[0,1]).dropna(axis=1, how="all")
index = pd.MultiIndex.from_frame(df.iloc[:, :2].droplevel(0, axis=1))
df = df.iloc[:, 2:].set_axis(index)
transaction_1 = df["transaction_1"].reset_index()
transaction_2 = df["transaction_2"].reset_index()
transaction_3 = df["transaction_3"].reset_index()