如果条件匹配,如何从每行中选择列位置

时间:2016-07-17 16:55:43

标签: python pandas

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

从年度预订大于0

的列中选择第一个单元格
import pandas as pd

df = pd.DataFrame({Company : ['abc','def','ghi']} {"2010" : [0,100,230]} {"2011" : [120,0,300]} {"2012" : [130,240,0]})

从first_column

中选择第三列
for column_name, column in df.transpose().iterrows():
    first_column = df[column_name > 0].index[0]
    first_column_value = df.iloc[first_column]

计算CAGR

 second_column_value = df.iloc[first_GPS_index+2]

请帮助我,我收到错误。我是python的新手 结果公司First_Column_Value Second_Column_Value        abc 100 230        def 120 300

1 个答案:

答案 0 :(得分:0)

我根据关于作为dict提供的数据和公司列表的一些假设修复了代码。随意交换年份和公司名称。如果这样做,则无需使用DataFrame的转置。

请参阅代码中的注释以获得进一步说明:

import pandas as pd

# sample data
company_names = ['Company A','Company B','Company C']
data = {"2010" : [0,100,230], "2011" : [120,0,300], "2012" : [130,240,0]}

# create DataFrame
df = pd.DataFrame(data, index=col_names)

# since the data is not provided in the correct way (rows and columns are swapped)
# we need to get the transpose of the DataFrame before further processing
df = df.T

# sort index in order to make sure that years are sorted chronologically
df.sort_index(inplace=True)
print(df)

# iterate through all columns and get the first index element where condition applies
# and store in dict
out = {}
for col in df:
    out[col] = df[df[col] > 0].index.tolist()[0]
print(out)

作为输出:

      Company A  Company B  Company C
2010          0        100        230
2011        120          0        300
2012        130        240          0
{'Company B': '2010', 'Company A': '2011', 'Company C': '2010'}

因此,例如公司B在2010年首次预订。

为了动态计算CAGR种类,您需要知道您可以假定为一个时间间隔,并确保每年都有数据。另一种方法是使用时间戳索引并使用timedeltas计算间隔。

为了简单起见,我假设您可以确保每年都有完整的数据并且硬编码一年的时间间隔:

# assume to have a time interval of one year
delta_t = 1

# in order to divide to rows we apply `df.div()` which basically divides two DataFrames.
# To divide each row with the following row we apply `df.shift(1)` to the same DataFrame
# in order to shift the DataFrame by one row (see docs on used commands for futher details).
cagr = ((df.div(df.shift(1)))**(1/delta_t) -1)*100
print(cagr)

,并提供:

      Company A   Company B   Company C
2010        NaN         NaN         NaN
2011        inf -100.000000   30.434783
2012   8.333333         inf -100.000000

从这一点开始,过滤此数据以获得适用的结果取决于您,因为从经济角度来看,CAGR为NaN(或甚至inf)没有多大意义。