Question

问题
我需要测试一列中每个数字的第一个数字。

条件
是checkVar的第一个数字大于5 要么是checkVar的第一个数字小于2
然后设置newVar = 1

解决方案

有人认为我曾将其转换为字符串，留下空格，然后取[0]，但我无法弄清楚代码。

或许像是，

df.ix[df.checkVar.str[0:1].str.contains('1'),'newVar']=1

这不是我想要的，由于某种原因我得到了这个错误

invalid index to scalar variable.

测试我的原始变量我得到的值应符合条件

df.checkVar.value_counts()
301    62
1      15
2       5
999     3
dtype: int64

理想情况下它看起来像这样：

            checkVar  newVar
NaN  1         nan    
     2         nan
     3         nan
     4         nan
     5       301.0
     6       301.0
     7       301.0
     8       301.0
     9       301.0
     10      301.0
     11      301.0
     12      301.0
     13      301.0
     14        1.0     1
     15        1.0     1

更新
我的最终解决方案，因为实际问题更复杂了

w = df.EligibilityStatusSP3.dropna().astype(str).str[0].astype(int)
v = df.EligibilityStatusSP2.dropna().astype(str).str[0].astype(int)
u = df.EligibilityStatusSP1.dropna().astype(str).str[0].astype(int)
t = df.EligibilityStatus.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['MCelig'] = ((t < 5)|(t == 9)|(u < 5)|(v < 5)|(w < 5)).astype(int)
df.MCelig = df.MCelig.fillna(0)

Answer 1

t = df.checkVar.dropna().astype(str).str[0].astype(int) #get a series of the first digits of non-nan numbers
df['newVar'] = ((t > 5) | (t < 2)).astype(int)
df.newVar = df.newVar.fillna(0)

这可能会稍微好一点，不确定，但另一种非常类似的方式来接近它。

t = df.checkVar.dropna().astype(str).str[0].astype(int)
df['newVar'] = 0
df.newVar.update(((t > 5) | (t < 2)).astype(int))

Answer 2

当您不确定如何继续时，有助于分解一些步骤。

def checkvar(x):
    s = str(x)
    first_d = int(s[0])
    if first_d < 2 or first_d > 5:
        return 1
    else:
        return 0

更改＆＃34;否则：return＆＃34;你想要的任何价值（例如，＆＃34;否则：传递＆＃34;）。此外，如果要创建新列：

*更新 - 之前我没有注意到NaN。我看到即使使用dropna（）你仍然遇到问题。以下是否适合您，就像对我一样？

df = pd.DataFrame({'old_col': [None, None, None, 13, 75, 22, 51, 61, 31]})
df['new_col'] = df['old_col'].dropna().apply(checkvar)
df

如果是这样，也许您的数据中的问题与＆＃39; old_col＆＃39;？的dtype有关。您是否尝试过将其转换为浮动？

df['old_col'] = df['old_col'].astype('float')

使用pandas检查列的第一个数字

2 个答案: