I have a problem with an if statement with my data from the dataframe. Somehow performing an if statement for years > 3years somehow all values larger than 9Y are not showing up and it is not clear why. The output looks like the following:
4Y 5Y 6Y 7Y 8Y 9Y 4Y 5Y 6Y 7Y 8Y 9Y
My code looks like the following:
import pandas as pd
df = pd.DataFrame([
['2015-02-09', '1Y', 2.241],
['2015-02-09', '1Y', 2.413],
['2015-02-09', '2Y', 2.228],
['2015-02-09', '2Y', 2.289],
['2015-02-09', '3Y', 2.263],
['2015-02-09', '3Y', 2.371],
['2015-02-09', '4Y', 2.413],
['2015-02-09', '5Y', 2.487],
['2015-02-09', '6Y', 2.578],
['2015-02-09', '7Y', 2.655],
['2015-02-09', '8Y', 2.74959],
['2015-02-09', '9Y', 2.81729],
['2015-02-09', '10Y', 2.853],
['2015-02-09', '12Y', 2.942],
['2015-02-09', '15Y', 3.047],
['2015-02-09', '20Y', 3.165],
['2015-02-09', '25Y', 3.225],
['2015-02-09','30Y', 3.225],
['2015-02-09', '1Y', 9.5],
['2015-02-09', '2Y', 8.75],
['2015-02-09', '3Y', 8.5],
['2015-02-09', '4Y', 8.13],
['2015-02-09', '5Y', 7.75],
['2015-02-09', '6Y', 7.63],
['2015-02-09', '7Y', 7.5],
['2015-02-09', '8Y', 7.45],
['2015-02-09','9Y', 7.25],
['2015-02-09', '10Y', 7.125],
['2015-02-09', '12Y', 7.08],
['2015-02-09', '15Y', 7.04],
['2015-02-09', '20Y', 6.435],
['2015-02-09', '25Y', 5.83],
['2015-02-09', '30Y', 5.45]
], columns=['date', 'year', 'values'])
for index, row in df.iterrows():
if row['year'] > '3Y':
print(row['year'])
答案 0 :(得分:3)
有问题是您按字典顺序比较字符串,因此10Y < 3Y
。解决方案是将值转换为整数。
df['mask'] = df['year'].str.extract('(\d+)', expand=False).astype(int) > 3
print (df)
date year values mask
0 2015-02-09 1Y 2.24100 False
1 2015-02-09 1Y 2.41300 False
2 2015-02-09 2Y 2.22800 False
3 2015-02-09 2Y 2.28900 False
4 2015-02-09 3Y 2.26300 False
5 2015-02-09 3Y 2.37100 False
6 2015-02-09 4Y 2.41300 True
7 2015-02-09 5Y 2.48700 True
8 2015-02-09 6Y 2.57800 True
9 2015-02-09 7Y 2.65500 True
10 2015-02-09 8Y 2.74959 True
11 2015-02-09 9Y 2.81729 True
12 2015-02-09 10Y 2.85300 True
13 2015-02-09 12Y 2.94200 True
14 2015-02-09 15Y 3.04700 True
15 2015-02-09 20Y 3.16500 True
16 2015-02-09 25Y 3.22500 True
17 2015-02-09 30Y 3.22500 True
18 2015-02-09 1Y 9.50000 False
19 2015-02-09 2Y 8.75000 False
20 2015-02-09 3Y 8.50000 False
21 2015-02-09 4Y 8.13000 True
22 2015-02-09 5Y 7.75000 True
23 2015-02-09 6Y 7.63000 True
24 2015-02-09 7Y 7.50000 True
25 2015-02-09 8Y 7.45000 True
26 2015-02-09 9Y 7.25000 True
27 2015-02-09 10Y 7.12500 True
28 2015-02-09 12Y 7.08000 True
29 2015-02-09 15Y 7.04000 True
30 2015-02-09 20Y 6.43500 True
31 2015-02-09 25Y 5.83000 True
32 2015-02-09 30Y 5.45000 True
来自@CristiFati评论的循环解决方案:
for index, row in df.iterrows():
if int(row["year"][:-1]) > 3:
print(row['year'])
或者使用正则表达式:
import re
for index, row in df.iterrows():
if int(re.search(r'\d+', row["year"]).group()) > 3:
print(row['year'])
也可以先创建整数列:
df['year-int'] = df['year'].str.extract('(\d+)', expand=False).astype(int)
for index, row in df.iterrows():
if row["year-int"] > 3:
print(row['year'])
答案 1 :(得分:0)
比较>
时,strings
符号适用不同的规则。尝试将其转换为int
作为数据框中的新列,然后打印> 3
。