我有以下格式的csv文件:
@Override
public void onDataChange(DataSnapshot dataSnapshot) {
fName = dataSnapshot.child("fName").getValue(String.class);
role = dataSnapshot.child("role").getValue(String.class);
welcomeText.setText("Welcome " + fName + "! You are logged-in as " + role);
}
我想将其读取到熊猫DF中,然后将其融合为以下格式:
| a | b | 2018 | 2018 | 2019 | 2019 |
| | | jan | feb | jan | feb |
---------------------------------------
| a1 | b1 | 0 | 1 | 2 | 3 |
| a1 | b2 | 4 | 5 | 6 | 7 |
| a2 | b1 | 8 | 9 | 10 | 11 |
| a2 | b2 | 12 | 13 | 14 | 15 |
如何实现?
答案 0 :(得分:1)
在使用普通数据框的情况下,这应该可以工作:
import pandas as pd
df = pd.DataFrame({
'a': ['a1', 'a1', 'a2', 'a2',],
'b': ['b1', 'b2', 'b2', 'b2',],
'2018 jan': [0, 4, 8, 12],
'2018 feb': [1, 5, 9, 13],
'2019 jan': [2, 6, 10, 14],
'2019 feb': [3, 7, 11, 15],
})
df = df.melt(id_vars=['a', 'b'], var_name='date', value_name='value')
df['date'] = df['date'].str.split(' ')
df['year'] = df['date'].str[0]
df['month'] = df['date'].str[1]
df.drop(columns='date', inplace=True)
输出:
a b value year month
0 a1 b1 0 2018 jan
1 a1 b2 4 2018 jan
2 a2 b2 8 2018 jan
3 a2 b2 12 2018 jan
4 a1 b1 1 2018 feb
5 a1 b2 5 2018 feb
6 a2 b2 9 2018 feb
7 a2 b2 13 2018 feb
8 a1 b1 2 2019 jan
9 a1 b2 6 2019 jan
10 a2 b2 10 2019 jan
11 a2 b2 14 2019 jan
12 a1 b1 3 2019 feb
13 a1 b2 7 2019 feb
14 a2 b2 11 2019 feb
15 a2 b2 15 2019 feb
如果注释中提到的列中有多个索引,则可以将其转换为普通数据框:
df = pd.read_csv('file.csv', header=[0,1])
df.columns = [' '.join(col).strip() for col in df.columns.values]
df.rename(columns={'a Unnamed: 0_level_1': 'a', 'b Unnamed: 1_level_1': 'b'}, inplace=True)
答案 1 :(得分:0)
@KOB我的回答通常可以适合任何具有2行标题的csv文件,其中部分列仅在第一行,部分在第一行和第二行。根据您的问题,此代码将按要求正确放置所有标头。 读取csv和创建的MulitIndex数据框时:
df_multiidx = pd.read_csv('two_levels_header_file.csv', header=[0,1])
id_vars = [idv for idv in df_multiidx.columns if 'Unnamed' in idv[1]]
value_vars = [valv for valv in df_multiidx.columns if 'Unnamed' not in valv[1]]
df_multiidx= df_multiidx.melt(id_vars=id_vars, value_vars=value_vars,var_name=['year','month'])
df_multiidx.rename(columns={col_ren:col_ren[0] for col_ren in id_vars})
输出:
a b year month value
0 a1 b1 2018 jan 0
1 a1 b2 2018 jan 4
2 a2 b1 2018 jan 8
3 a2 b2 2018 jan 12
4 a1 b1 2018 feb 1
5 a1 b2 2018 feb 5
6 a2 b1 2018 feb 9
7 a2 b2 2018 feb 13
8 a1 b1 2019 jan 2
9 a1 b2 2019 jan 6
10 a2 b1 2019 jan 10
11 a2 b2 2019 jan 14
12 a1 b1 2019 feb 3
13 a1 b2 2019 feb 7
14 a2 b1 2019 feb 11
15 a2 b2 2019 feb 15