我有一个包含不同子文件夹的文件夹,其中每个文件夹merged.txt
都有相同的框架,但标题名称有些不同,如下所示:
../a/merged.txt
:
stat,a_a,b_a,c_a,d_a
std,1,2,3,4
../b/merged.txt
:
stat,a_b,b_b,c_b,d_b
std,2,3,4,5
我想输出一个表格,其中包含标题和行名称之外的每一行的中间值,如下所示:
stat,a,b,c,d
std,1.5,2.5,3.5,4.5
有谁知道怎么做?感谢
答案 0 :(得分:1)
以下是使用pandas
和numpy
的一种方式。
import pandas as pd
from io import StringIO
str1 = StringIO("""
stat,a_a,b_a,c_a,d_a
std,1,2,3,4""")
str2 = StringIO("""
stat,a_b,b_b,c_b,d_b
std,2,3,4,5""")
# replace str1 & str2 with 'file1.csv' and 'file2.csv'
df1 = pd.read_csv(str1)
df2 = pd.read_csv(str2)
df = pd.DataFrame(np.median([df1.iloc[:, 1:].values, df2.iloc[:, 1:].values], axis=0),
columns=list('abcd')).assign(stat=df1['stat'])
df = df[['stat', 'a', 'b', 'c', 'd']]
df.to_csv('file.csv', index=False)
# stat a b c d
# 0 std 1.5 2.5 3.5 4.5
答案 1 :(得分:1)
import pandas as pd
df_a = pd.read_csv('./a/merged.txt')
df_b = pd.read_csv('./b/merged.txt')
column_names = ["stat","a","b","c","d"]
df_a.columns = column_names
df_b.columns = column_names
df_combined = pd.concat([df_a, df_b])
med = df_combined.median()
df_out = pd.DataFrame(columns = column_names)
df_out.at[0,"stat"] = "std"
for c in column_names[1:]:
df_out.loc[0,c] = med[c]
print(df_out.to_csv(index=False))
...
我更喜欢@jpp的解决方案......