如何从一个具有多个索引列的csv获取两个数据帧

时间:2017-07-25 18:27:12

标签: python pandas

我有一个类似的CSV文件:

Function Do-Stuff
{
    Param($Environment,$Action,$Schedule,$Note)

    <# logic #>
}
$Splat = @{
    Environment='';
    Action='';
    Schedule='';
    Note='';
}

Write-Host "Env:`r`n`t1) staging`r`n`t2) prod`r`nSelection:"
$Splat.Environment = Read-Host

Write-Host "Select action to perform:`r`n`t1) foo`r`n`t2) bar`r`nSelection:"
$Splat.Action = Read-Host

Write-Host "Schedule or leave blank to schedule now (yyyy-mm-dd hh:mm:ss):"
$Splat.Schedule = Read-Host

Write-Host "note (leave blank to skip):"
$Splat.Note = Read-Host

Write-Host @"
Plan of action:
  >> Sending action to: $($Splat.Environment)
  >> Scheduling a action of: $($Splat.Action)
  >> Schedule date: $($Splat.Schedule)
  >> Notes: $($Splat.Note)
Ok to proceed? (Y|N):"@
$Agree = Read-Host
If ($Agree.ToUpper() -eq 'Y')
{
    Do-Stuff @Splat
}

想要只有一次“行”的单个数据框。

想法是创建两个数据框并将它们合并到一个resp到列Time [s]。所以我创建了那个序列。

const result = Array.from({ length: 5 }, (_, k) => `Cat #${k}`);

console.log(result);

但它没有用。 KeyError:'时间[s]'

/ ********************************************** **************************** /

我发现pandas正在为重复的列添加编号。所以我改变了我的代码。

Time [s],Channel 0-Analog, Time [s],Reset-Digital, Time [s],Channel 1-Digital, Time [s],Channel 2-Digital, Time [s],Channel 3-Digital
-0.002204166666667, 2048.000000000000000, -0.002204166666667, 1, -0.002204166666667, 0, -0.002204166666667, 1, -0.002204166666667, 1
-0.002204000000000, 2048.000000000000000, -0.001124000000000, 0, -0.001504666666667, 1, -0.001448500000000, 0, -0.000199666666667, 0
-0.002203833333333, 2048.000000000000000, -0.000000000000000, 1, 0.000301666666667, 0, 0.000841666666667, 1, 0.000056333333333, 1
-0.002203666666667, 2048.000000000000000, 0.000550833333333, 0, 0.000932000000000, 1, 0.003178666666667, 0, 0.002361000000000, 0
-0.002203500000000, 2048.000000000000000, 0.003259333333333, 1, 0.002538166666667, 0, 0.005142333333333, 1, 0.004062000000000, 1
-0.002203333333333, 2048.000000000000000, 0.005602833333333, 0, ...

但现在我遇到的问题是索引只是为没有NaN的元素排序。首先表示两列都有数字的所有行,然后只有第一列没有NaN,然后​​只有第二列没有NaN。

df1 = pd.read_csv('untitled.csv',usecols=[2,3])
df2 = pd.read_csv('untitled.csv',usecols=[4,5])

merged = pd.merge(df1,df2,on=r'Time [s]')

我需要这种格式

df1 = pd.read_csv('untitled.csv',usecols=[2,3])
df2 = pd.read_csv('untitled.csv',usecols=[4,5])
df1.columns = df1.columns.str.strip('.123 ')
df2.columns = df2.columns.str.strip('.123 ')
merged =pd.merge(df1,df2,on=r'Time [s]',how='outer')
merged.set_index(r'Time [s]')

2 个答案:

答案 0 :(得分:0)

我使用pd.melt提出了一个更简单的建议:

  • 将csv读入您感兴趣的列的单个数据框中;
  • 选择包含Time作为键的列名称和列名称 包含Channel作为值;
  • 如果您愿意,可以使用df.drop("variable", axis=1)来摆脱 由熔化创建的额外列。

代码示例

df = pd.read_csv('untitled.csv')
keys = [col for col in df.columns if col.startswith('Time')]
values = [col for col in df.columns if col.startswith('Channel')]
pd.melt(df, id_vars=values, value_vars=keys, value_name='Time')

注意:我的回答受this启发: - )

答案 1 :(得分:0)

如果所有列名称都是唯一的,并且Time列是信号列的前一列,则解决方案有效:

#get all columns with Digital text
d = df.columns[df.columns.str.contains('Digital')]
print (d)
Index(['Reset-Digital', 'Channel 1-Digital', 'Channel 2-Digital',
       'Channel 3-Digital'],
      dtype='object')

#get all previous columns (Time columns)
#for new versions of pandas for Time columns are added 1,2..for no duplicates
td = df.columns[df.columns.get_indexer(d) - 1]
print(td)
Index(['Time [s].1', 'Time [s].2', 'Time [s].3', 'Time [s].4'], dtype='object')
#zip time and signal column and concat data
df = pd.concat([df.set_index(x[0])[x[1]] for x in zip(td, d)], axis=1)
print (df)
          Reset-Digital  Channel 1-Digital  Channel 2-Digital  \
-0.002204            1.0                0.0                1.0   
-0.001505            NaN                1.0                NaN   
-0.001448            NaN                NaN                0.0   
-0.001124            0.0                NaN                NaN   
-0.000200            NaN                NaN                NaN   
-0.000000            1.0                NaN                NaN   
 0.000056            NaN                NaN                NaN   
 0.000302            NaN                0.0                NaN   
 0.000551            0.0                NaN                NaN   
 0.000842            NaN                NaN                1.0   
 0.000932            NaN                1.0                NaN   
 0.002361            NaN                NaN                NaN   
 0.002538            NaN                0.0                NaN   
 0.003179            NaN                NaN                0.0   
 0.003259            1.0                NaN                NaN   
 0.004062            NaN                NaN                NaN   
 0.005142            NaN                NaN                1.0   

           Channel 3-Digital  
-0.002204                1.0  
-0.001505                NaN  
-0.001448                NaN  
-0.001124                NaN  
-0.000200                0.0  
-0.000000                NaN  
 0.000056                1.0  
 0.000302                NaN  
 0.000551                NaN  
 0.000842                NaN  
 0.000932                NaN  
 0.002361                0.0  
 0.002538                NaN  
 0.003179                NaN  
 0.003259                NaN  
 0.004062                1.0  
 0.005142                NaN