Question

以下是数据框。 PIC_1和Wgt是字符串，p.lgth和p_lgth是整数。如果p_lgth不等于30，我想在PIC_1中找到42并抓住42和它后面的15位数。

                                            PIC_1  Wgt  p.lgth  p_lgth
**PARTIAL-DECODE***P / 42011721930018984390078...  112      53      53

所以上面的输出应该是42011721930018984

我的代码不起作用：

def pic_mod(row):
 if row['p_lgth'] !=30:
    PIC_loc = row['PIC_1'].find('42')
    PIC_2 = row['PIC_1'].str[PIC_loc:PIC_loc + 15]
 elif row['p_lgth']==30:
    PIC_2=PIC_1  
 return PIC_2

row_1只是较大df中的一行，与上面给出的示例行相同

 row_1 = df71[2:3]
 pic_mod(row_1)

 ValueError: The truth value of a Series is ambiguous. Use a.empty, 
 a.bool (), a.item(), a.any() or a.all().

我对变量做了type（）并得到了

  type(df71['PIC_1']) = pandas.core.series.Series
  type(df71['p_lgth']) = pandas.core.series.Series
  type(df71['Wgt']) = pandas.core.series.Series

我对Python很新。这些数据类型应该以int和str的形式返回吗？ df71是一个df。

Answer 1

根据帖子中的错误消息，也许可以试试这个：

def pic_mod(row):
 if row['p_lgth'].any() != 30:
    PIC_loc = row['PIC_1'].str.find('42')[0]
    PIC_2 = row['PIC_1'].str[PIC_loc:PIC_loc + 17]
 elif row['p_lgth'].any() == 30:
     PIC_2=PIC_1  
 return PIC_2

但是，如果您的数据已经在pandas数据帧中构建，那么通常不会编写这样的显式函数。

E.g。 p_legth不等于30对数据集中所有行的初始过滤将是单行，如：

df_fltrd = df[df['p_lgth']!=30]

完成此操作后，您可以将任意函数应用于PIC_1列中的条目，例如：在你的情况下，长度为17的子字符串以'42'开头：

df_fltrd['PIC_1'].apply(lambda x: x[x.find('42'):x.find('42')+17])

在某个值之后抓取字符

1 个答案: