我有一个pandas数据框,其中有几行共享特定的列值。对于这些行,我想将这些行连接为单个行。而且,共享列值的行数有所不同,我想将这些数据框拆分为自己的单独数据框,因此要为特定数量的共享行使用唯一的数据框。
这是我想要的例子。
import pandas as pd
data = [['tom', 2], ['ni2ck', 2], ['j3uli', 4] , ['nic4k', 4], ['jul5i', 4] , ['nic6k', 7], ['ju7li', 7] , ['nic8k', 7], ['ju9li', 7] , ['nic1k', 8], ['car', 8]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df
上面的代码生成原始数据帧的外观
结果将是
Name Age
0 tom 2
1 ni2ck 2
2 j3uli 4
3 nic4k 4
4 jul5i 4
5 nic6k 7
6 ju7li 7
7 nic8k 7
8 ju9li 7
9 nic1k 8
10 car 8
我想将共享同一Age列的所有行放入一行,然后根据为每个共享行生成的列数来分离数据框。所以结果看起来像这样
第一个结果数据帧具有两行,因为有两行共享相同的列数。
Name Name Age
0 tom ni2ck 2
1 nic1k car 8
第二个结果数据帧
Name Name Name Age
0 j3uli nic4k jul5i 4
第三结果数据框
Name Name Name Name Age
0 nic6k ju7li nic8k ju9li 7
答案 0 :(得分:2)
这是一种方法
private bool ValidateInputs()
{
if (txtBox_eventName.Text.Trim() == string.Empty)
{
MessageBox.Show("Please enter a valid event name", "Action Required", MessageBoxButtons.OK, MessageBoxIcon.Error);
txtBox_eventName.Focus();
return false;
}
if (nud_noOfGuests.Value < 10 || nud_noOfGuests.Value > 200)
{
MessageBox.Show("Please enter no of guests between 10 and 200", "Action Required", MessageBoxButtons.OK, MessageBoxIcon.Error);
return false;
}
if (radBtn_primeRib.Checked == false && radBtn_chickenMarsala.Checked == false && radBtn_gardenLasagna.Checked == false)
{
MessageBox.Show("Please make an Entree choice", "Action Reuired",MessageBoxButtons.OK, MessageBoxIcon.Error);
return false;
}
return true;
}
private void btn_createEvent_Click(object object sender, EventArgs e)
{
if(ValidateInputs()){
secondmethod();
}
calcCharges = new CateringEvent(eventName, noOfGuests, selectedEntre, barOption, wineOption);
lbl_calcEntreCharges.Text = calcCharges.EntreCharge.ToString("C2");
lbl_calcDrinkCharges.Text = calcCharges.DrinksCharge.ToString("C2");
lbl_calcSurcharge.Text = calcCharges.Surcharge.ToString("C2");
lbl_calcTotalCharges.Text = calcCharges.TotalCharge.ToString("C2");
txtBox_eventName.Enabled = false;
btn_createEvent.Enabled = false;
btn_modifyEvent.Enabled = true;
}
答案 1 :(得分:1)
from collections import defaultdict
d = defaultdict(list)
for age, df_ in df.groupby('Age'):
d[len(df_.T.columns)].append(df_.reset_index(drop=True).T.loc[['Name']].assign(Age=age))
d = {k: pd.concat(v, ignore_index=True ) for k,v in d.items()}
然后,通过它们具有的名称数访问数据框。例如,
>>> d[2]
0 1 Age
0 tom ni2ck 2
1 nic1k car 8
>>> d[3]
0 1 2 Age
0 j3uli nic4k jul5i 4