我想对第3列中的所有值求和,以使用熊猫认为更有效的熊猫第一和第二列将结果保存到新的csv文件中。
可以加在一起的最大值在0到2之间
如果存在除0.5,1或2以外的值或字符,则将忽略加法。
csv文件的示例:
encounterId|chartTime|11885|67187|6711|6711|6710|1356|1357|1358|1359|1360|1361|1362|1366|140|140
325|2014-01-01 00:00:00|0
325|2014-01-01 01:00:00|0|0|0
325|2014-01-01 02:00:00|0
325|2014-01-01 03:00:00|0|0|0
325|2014-01-01 04:00:00|0
325|2014-01-01 05:00:00|1
325|2014-01-01 06:00:00|0|0|0
325|2014-01-01 07:00:00|1|0|0.5|1
325|2014-01-01 08:00:00|0
325|2014-01-01 09:00:00|1|0|0
325|2014-01-01 10:00:00|0
325|2014-01-01 11:00:00|1|0|0
325|2014-01-01 12:00:00|0
325|2014-01-01 13:00:00|0|0|0.5|1
325|2014-01-01 14:00:00|0
325|2014-01-01 15:00:00|0
我正在寻找什么:
323|2013-06-03 00:00:00|0
323|2013-06-03 01:00:00|1
323|2013-06-03 02:00:00|1.5
323|2013-06-03 03:00:00|1.5
323|2013-06-03 04:00:00|0
323|2013-06-03 05:00:00|0.5
323|2013-06-03 06:00:00|0
323|2013-06-03 07:00:00|3.5
323|2013-06-03 08:00:00|0.5
我尝试过没有熊猫,这给了我一些奇怪的结果
答案 0 :(得分:1)
您可以按照上一个答案here
的建议,求和并设置参数轴= 1答案 1 :(得分:1)
使用此,
Dim nameArray() As Variant
Dim resultArray() As Variant
nameArray = Array("france", "usa", "germany", "switzerland", "spain")
For each name in nameArray
With w2.Worksheets(name)
.Range("D2:S17").Value = w1.Worksheets(name).Range("D2:S17").Value
.Range("AX2:BM17").Value = w1.Worksheets(name).Range("AX2:BM17").Value
.Range("AB2:AQ17").Value = w1.Worksheets(name).Range("AB2:AQ17").Value
.Name = .Name & "_tab1"
resultArray = .Range("D2:S17").Value ' 2D array
' do array calculations here
End With
Next
输出:
from io import StringIO
csvfile = StringIO("""323|2013-06-03 00:00:00|0|0|0
323|2013-06-03 01:00:00|1|
323|2013-06-03 02:00:00|1|0|0.5|86
323|2013-06-03 03:00:00|1|0|0.5|0
323|2013-06-03 04:00:00|0
323|2013-06-03 05:00:00|0|0|0.5|0
323|2013-06-03 06:00:00|0
323|2013-06-03 07:00:00|1|0|0.5|2
323|2013-06-03 08:00:00|0|0.5""")
df = pd.read_csv(csvfile, sep='|', names=['ID','date','A','B','C','D'])
df_out = df.set_index(['ID','date'])
df_out.where((df_out>0) & (df_out<=2), 0)\
.sum(1)\
.reset_index()\
.to_csv('outfile.csv', index=False, header=False)
!type outfile.csv
答案 2 :(得分:1)
请注意,pd.read_csv()
如果读取列数可变的csv会抛出错误,除非您事先提供了列名。应该这样做:
import pandas as pd
import numpy as np
df = pd.read_csv('sample.txt', names=['Index','Date','Val1','Val2','Val3','Val4'], sep='|')
df[df[['Val1','Val2','Val3','Val4']]>2] = np.nan
df['Final'] = df.iloc[:,2:].sum(axis=1)
df = df[['Index','Date','Final']]
礼物:
Index Date Final
0 323 2013-06-03 00:00:00 0.0
1 323 2013-06-03 01:00:00 1.0
2 323 2013-06-03 02:00:00 1.5
3 323 2013-06-03 03:00:00 1.5
4 323 2013-06-03 04:00:00 0.0
5 323 2013-06-03 05:00:00 0.5
6 323 2013-06-03 06:00:00 0.0
7 323 2013-06-03 07:00:00 3.5
8 323 2013-06-03 08:00:00 0.5
这是一种更简洁的方法(与下面@Scott Boston的回答非常相似,但避免了创建单独的数据框)。通过将csv的前两列设置为数据框的索引,可以有条件地过滤仅包含浮点值的数据框的其余部分:
df = pd.read_csv('sample.txt', names=['Index','Date','Val1','Val2','Val3','Val4'], sep='|').set_index(['Index','Date'])
df['Final'] = df[(df>0) & (df<=2)].sum(axis=1)
df.reset_index()[['Index','Date','Final']].to_csv('output.csv', index=False, header=False)
礼物:
323,2013-06-03 00:00:00,0.0
323,2013-06-03 01:00:00,1.0
323,2013-06-03 02:00:00,1.5
323,2013-06-03 03:00:00,1.5
323,2013-06-03 04:00:00,0.0
323,2013-06-03 05:00:00,0.5
323,2013-06-03 06:00:00,0.0
323,2013-06-03 07:00:00,3.5
323,2013-06-03 08:00:00,0.5
答案 3 :(得分:0)
怎么样?
for row in df.rows:
row[row.columns[2]]=sum(row[row.columns[>1]])