Python计算不同数据帧的中值

时间:2018-04-04 12:34:33

标签: python pandas dataframe

我有一个包含不同子文件夹的文件夹,其中每个文件夹merged.txt都有相同的框架,但标题名称有些不同,如下所示:

../a/merged.txt

stat,a_a,b_a,c_a,d_a
std,1,2,3,4

../b/merged.txt

stat,a_b,b_b,c_b,d_b
std,2,3,4,5

我想输出一个表格,其中包含标题和行名称之外的每一行的中间值,如下所示:

stat,a,b,c,d
std,1.5,2.5,3.5,4.5

有谁知道怎么做?感谢

2 个答案:

答案 0 :(得分:1)

以下是使用pandasnumpy的一种方式。

import pandas as pd
from io import StringIO

str1 = StringIO("""
stat,a_a,b_a,c_a,d_a
std,1,2,3,4""")

str2 = StringIO("""
stat,a_b,b_b,c_b,d_b
std,2,3,4,5""")

# replace str1 & str2 with 'file1.csv' and 'file2.csv'
df1 = pd.read_csv(str1)
df2 = pd.read_csv(str2)

df = pd.DataFrame(np.median([df1.iloc[:, 1:].values, df2.iloc[:, 1:].values], axis=0),
                  columns=list('abcd')).assign(stat=df1['stat'])

df = df[['stat', 'a', 'b', 'c', 'd']]

df.to_csv('file.csv', index=False)

#   stat    a    b    c    d
# 0  std  1.5  2.5  3.5  4.5

答案 1 :(得分:1)

import pandas as pd

df_a = pd.read_csv('./a/merged.txt')
df_b = pd.read_csv('./b/merged.txt')

column_names = ["stat","a","b","c","d"]

df_a.columns = column_names
df_b.columns = column_names

df_combined = pd.concat([df_a, df_b])
med = df_combined.median()

df_out = pd.DataFrame(columns = column_names)
df_out.at[0,"stat"] = "std"
for c in column_names[1:]:
    df_out.loc[0,c] = med[c]

print(df_out.to_csv(index=False))

...
我更喜欢@jpp的解决方案......