我创建了一个生成一些值的循环。我想将这些值存储在数据框中。例如,完成一个循环,追加到第一行。
def calculate (allFiles):
result = pd.DataFrame(columns = ['Date','Mid Ebb Total','Mid Flood Total','Mid Ebb Control','Mid Flood Control'])
total_Mid_Ebb = 0
total_Mid_Flood = 0
total_Mid_EbbControl = 0
total_Mid_FloodControl = 0
for file_ in allFiles:
xls = pd.ExcelFile(file_)
df = xls.parse('General Impact')
Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter
Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
total_Mid_Ebb += Mid_Ebb.Station.nunique() #count unique stations = sample number
total_Mid_Flood += Mid_Flood.Station.nunique()
total_Mid_EbbControl += Mid_Ebb_control.Station.nunique()
total_Mid_FloodControl += Mid_Flood_control.Station.nunique()
Mid_Ebb_withoutControl = total_Mid_Ebb - total_Mid_EbbControl
Mid_Flood_withoutControl = total_Mid_Flood - total_Mid_FloodControl
print('Ebb Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl))
print('Flood Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Flood, Mid_Flood_withoutControl, total_Mid_FloodControl))
数据框结果包含4列。日期是固定的。我想将total_Mid_Ebb,Mid_Ebb_withoutControl,total_Mid_EbbControl放到数据帧中。
答案 0 :(得分:2)
我相信你需要在循环中将标量追加到元组列表中,然后使用(?:washroom|lav(?:atory)?).*?([lr]20)|([lr]20).*?(?:washroom|lav(?:atory)?)
构造函数。 DataFrame
DataFrame中的最后一次计数差异:
result
答案 1 :(得分:1)
以下是在循环的每次迭代后在数据框中加载每列数据的示例。虽然这不是唯一的方法,但它有助于更好地理解概念。
必要的进口
import pandas as pd
from random import randint
首先定义一个5列的空数据框以匹配您的问题
df = pd.DataFrame(columns=['A','B','C','D','E'])
接下来,我们遍历for循环并使用randint()
生成值,并一次向每列添加一个值,以'A'一直到'E',
for i in range(5): #add 5 rows of data
df.loc[i, ['A']] = randint(0,99)
df.loc[i, ['B']] = randint(0,99)
df.loc[i, ['C']] = randint(0,99)
df.loc[i, ['D']] = randint(0,99)
df.loc[i, ['E']] = randint(0,99)
我们得到一个DF,其中填充了5行。
>>> df
A B C D E
0 4 74 71 37 90
1 41 80 77 81 8
2 14 16 82 98 89
3 1 77 3 56 91
4 34 9 85 44 19
希望以上有所帮助,您可以根据自己的需求量身定制。
答案 2 :(得分:0)
注意这不会按要求为每个文件生成一行,但它更多地是关于Pandas的一般用法的评论,对于这样的问题 - 通常更容易读取所有数据然后使用pandas文件,而不是在不同的情况下编写自己的循环。
我认为你并没有以惯用的方式使用大熊猫。我认为如果你这样做,你将节省大量代码并获得更易理解的结果:
controlstations = ['C1', 'C2', 'C3', 'SR2']
df = pd.concat(pd.read_excel(file_, sheetname='General Impact') for file_ in files)
df['Control'] = df.Station.isin(controlstations)
counts = df.groupby(['Control', 'Tidal Mode']).Station.agg('nunique')
所以在这里,您首先将所有excel文件读入单个数据帧,然后添加一列以指示是否是控制站,然后使用groupby计算不同的组合。
counts
是一个带有二维索引的系列(对于一些组成的数据):
Control Tidal Mode
False Mid-Ebb 2
Mid-Flood 2
True Mid-Ebb 2
Mid-Flood 2
您可以像这样访问函数中的值:
total_Mid_Ebb = counts['Mid-Ebb'].sum()
total_Mid_Ebb_Control = counts['Mid-Ebb', True]
total_Mid_Flood = counts['Mid-Flood'].sum()
total_Mid_Flood_Control = counts['Mid-Flood', True]
之后,您可以轻松地将它们添加到DataFrame中:
import datetime
today = datetime.datetime.today()
totals = [total_Mid_Ebb, total_Mid_Flood, total_Mid_Ebb_Control, total_Mid_Flood_Control]
result = pd.DataFrame(data=[totals], columns=['Mid Ebb Total', 'Mid Flood Total', 'Mid Ebb Control', 'Mid Flood Control'],
index=[today])