Question

我有以下数据框：

         chr start_position        end_position  gene_name
0        Chr       Position                 Ref  Gene_Name
1      chr22       24128945                   G        nan
2      chr19       45867080                   G      ERCC2
3       chr3       52436341                   C       BAP1
4       chr7      151875065                   G      KMT2C
5      chr19        1206633               CGGGT      STK11

我希望将整个'end_position'列转换为包含'start_position'+ len（'end_position'）的值，结果应为：

     chr start_position        end_position  gene_name
0        Chr       Position                 Ref  Gene_Name
1      chr22       24128945            24128946       nan
2      chr19       45867080            45867081      ERCC2
3       chr3       52436341            52436342       BAP1
4       chr7      151875065           151875066      KMT2C
5      chr19        1206633             1206638      STK11

我写了以下脚本：

patient_vcf_to_df.apply(pd.to_numeric, errors='ignore')
patient_vcf_to_df['end_position'] = patient_vcf_to_df['end_position'].map(lambda x: patient_vcf_to_df['start_position'] + len(x))

但我得到了错误： TypeError：必须是str，而不是int

任何人都知道如何解决问题？

非常感谢！

Answer 1

首先，我以likes ≡ ¬dislikes ¬∃(Dog ⊓ dislikes.(Colourful ⊓ Toy))行将成为标题（列名称）的方式阅读您的CSV：

Cute(∀Pomeranian)

获得以下DF：

作为积极的副作用：

df = pd.read_csv(filename, header=1)

如果你想小写你的列：

     Chr   Position    Ref Gene_Name
0  chr22   24128945      G       NaN
1  chr19   45867080      G     ERCC2
2   chr3   52436341      C      BAP1
3   chr7  151875065      G     KMT2C
4  chr19    1206633  CGGGT     STK11

确保In [99]: df.dtypes Out[99]: chr object position int64 # <--- NOTE ref object gene_name object dtype: object列是数字dtype：

In [97]: df.columns = df.columns.str.lower()

In [98]: df
Out[98]:
     chr   position    ref gene_name
0  chr22   24128945      G       NaN
1  chr19   45867080      G     ERCC2
2   chr3   52436341      C      BAP1
3   chr7  151875065      G     KMT2C
4  chr19    1206633  CGGGT     STK11

然后：

position

转换datafarme pandas中整个列的值

1 个答案: