我正在使用Jupyter Notebooks中的数据框,但遇到了一些困难。数据框由位置组成,这些位置由坐标表示。这些点代表驾驶员在特定日期所走的路线。
目前有 3 列;开始,中间或结束。
驾驶员从起点开始一天的工作,访问1个或多个中间点,并在一天结束时返回终点。起点就像一个基本位置,因此终点与起点相同。
这是非常基本的,但是我无法可视化这些数据。我在想以下这样的事情,以帮助改善我的状况:
| Start | Intermediary | End |
| | | | | | |
_________________________________________________________________
| s_lat | s_lng | i_lat | i_lng | e_lat | e_lng |
还是最好废弃前三列(开始,中间,结束)?
我不希望按照指南在这里开始讨论,所以我希望学习有关Python Pandas的新知识,并且如果有办法可以改进当前的方法。
答案 0 :(得分:1)
我认为这里MultiIndex
由MultiIndex.from_product
创建:
mux = pd.MultiIndex.from_product([['Start','Intermediary','End'], ['lat','lng']])
df = pd.DataFrame(data, columns=mux)
编辑:
设置:
temp=u""" start intermediary end
('54.957055',' -7.740156') ('54.956915136264', ' -7.753690062122') ('54.957055','-7.740156')
('54.8913208', '-7.5740475') ('54.864402885577', '-7.653445692445'),('54','0') ('54.8913208','-7.5740475')
('55.2375819', '-7.2357427') ('55.253936739337', '-7.259624609577'), ('54','2'),('54','1') ('55.2375819','-7.2357427')
('54.5298806', '-8.1350247') ('54.504374314741', '-8.188334960168') ('54.5298806','-8.1350247')
('54.2810187', ' -7.896937') ('54.303836850038', '-8.180136033695'), ('54','3') ('54.2810187','-7.896937')
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s{3,}")
print (df)
start \
0 ('54.957055',' -7.740156')
1 ('54.8913208', '-7.5740475')
2 ('55.2375819', '-7.2357427')
3 ('54.5298806', '-8.1350247')
4 ('54.2810187', ' -7.896937')
intermediary \
0 ('54.956915136264', ' -7.753690062122')
1 ('54.864402885577', '-7.653445692445'),('54','0')
2 ('55.253936739337', '-7.259624609577'), ('54',...
3 ('54.504374314741', '-8.188334960168')
4 ('54.303836850038', '-8.180136033695'), ('54',...
end
0 ('54.957055','-7.740156')
1 ('54.8913208','-7.5740475')
2 ('55.2375819','-7.2357427')
3 ('54.5298806','-8.1350247')
4 ('54.2810187','-7.896937')
import ast
#convert string values to tuples
df = df.applymap(lambda x: ast.literal_eval(x))
#convert onpy pairs values to nested lists
df['intermediary'] = df['intermediary'].apply(lambda x: list(x) if isinstance(x[1], tuple) else [x])
#DataFrame by first Start column
df1 = pd.DataFrame(df['start'].values.tolist(), columns=['lat','lng'])
#DataFrame by intermediary column with reshape for 2 columns df
df2 = (pd.concat([pd.DataFrame(x, columns=['lat','lng']) for x in df['intermediary']], keys=df.index)
.reset_index(level=1, drop=True)
.add_prefix('intermediary_'))
print (df2)
#join all DataFrames together
df3 = df1.add_prefix('start_').join(df2).join(df1.add_prefix('end_'))
#create MultiIndex by split
df3.columns = df3.columns.str.split('_', expand=True)
print (df3)
start intermediary end \
lat lng lat lng lat
0 54.957055 -7.740156 54.956915136264 -7.753690062122 54.957055
1 54.8913208 -7.5740475 54.864402885577 -7.653445692445 54.8913208
1 54.8913208 -7.5740475 54 0 54.8913208
2 55.2375819 -7.2357427 55.253936739337 -7.259624609577 55.2375819
2 55.2375819 -7.2357427 54 2 55.2375819
2 55.2375819 -7.2357427 54 1 55.2375819
3 54.5298806 -8.1350247 54.504374314741 -8.188334960168 54.5298806
4 54.2810187 -7.896937 54.303836850038 -8.180136033695 54.2810187
4 54.2810187 -7.896937 54 3 54.2810187
lng
0 -7.740156
1 -7.5740475
1 -7.5740475
2 -7.2357427
2 -7.2357427
2 -7.2357427
3 -8.1350247
4 -7.896937
4 -7.896937
答案 1 :(得分:0)
您可以读取带有2个标题(2级列)的Excel文件。
df = pd.read_excel(
sourceFilePath,
index_col = [0],
header = [0, 1]
)
您可以像这样重塑df,以便仅保留1个标头(仅使用1个标头更容易工作):
df = df.stack([0,1], dropna=False).to_frame('Valeur').reset_index()
答案 2 :(得分:0)
要将顶部列添加到 pd.DataFrame 运行:
def add_top_column(df, top_col, inplace=False):
if not inplace:
df = df.copy()
df.columns = pd.MultiIndex.from_product([[top_col], df.columns])
return df
orig_df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
new_df = add_top_column(orig_df, "new column")
为了将 3 个 DataFrames 与自己的新顶列组合在一起:
new_df2 = add_top_column(orig_df, "new column2")
new_df3 = add_top_column(orig_df, "new column3")
print(pd.concat([new_df, new_df2, new_df3], axis=1))
"""
# And this is the expected output:
new column new column2 new column3
a b a b a b
0 1 2 1 2 1 2
1 3 4 3 4 3 4
"""
请注意,如果 DataFrame 的索引不匹配,您可能需要重置索引。