Question

我有一个看起来像这样的数据框

Geneid  PRKCZ.exon1 PRKCZ.exon2 PRKCZ.exon3 PRKCZ.exon4 PRKCZ.exon5 PRKCZ.exon6 PRKCZ.exon7 PRKCZ.exon8 PRKCZ.exon9 PRKCZ.exon10    ... FLNA.exon31 FLNA.exon32 FLNA.exon33 FLNA.exon34 FLNA.exon35 FLNA.exon36 FLNA.exon37 FLNA.exon38 MTCP1.exon1 MTCP1.exon2
S28 22  127 135 77  120 159 49  38  409 67  ... 112 104 37  83  47  18  110 70  167 19
22  3   630 178 259 142 640 77  121 521 452 ... 636 288 281 538 276 109 242 314 790 484
S04 16  658 320 337 315 881 188 162 769 577 ... 1291    420 369 859 507 208 554 408 1172    706
56  26  663 343 390 314 1090    263 200 844 592 ... 675 243 250 472 280 133 300 275 750 473
S27 13  1525    571 1081    560 1867    427 370 1348    1530    ... 1817    926 551 1554    808 224 971 1313    1293    701
5 rows × 8297 columns

在上面的数据框中，我需要添加一个包含索引信息的额外列。所以我制作了一个列表 - 健康的所有索引都被标记为h并且休息一切都应该是d。

所以尝试了以下几行：

healthy=['39','41','49','50','51','52','53','54','56']

H_type =pd.Series( ['h' for x in df.loc[healthy]  
                    else 'd' for x in df]).to_frame()

但它让我跟着错误：

SyntaxError: invalid syntax

任何帮助都会非常感激

最后我的目标是：

Geneid  sampletype  SSX4.exon4  SSX2.exon11 DUX4.exon5  SSX2.exon3  SSX4.exon5  SSX2.exon10 SSX4.exon7  SSX2.exon9  SSX4.exon8  ... SETD2.exon21    FAT2.exon15 CASC5.exon8 FAT1.exon21 FAT3.exon9  MLL.exon31  NACA.exon7  RANBP2.exon20   APC.exon16  APOB.exon4
    S28 h   0   0   0   0   0   0   0   0   0   ... 2480    2003    2749    1760    2425    3330    4758    2508    4367    4094
    22  h   0   0   0   0   0   0   0   0   0   ... 8986    7200    10123   12422   14528   18393   9612    15325   8788    11584
    S04 h   0   0   0   0   0   0   0   0   0   ... 14518   16657   17500   15996   17367   17948   18037   19446   24179   28924
    56  h   0   0   0   0   0   0   0   0   0   ... 17784   17846   20811   17337   18135   19264   19336   22512   28318   32405
    S27 h   0   0   0   0   0   0   0   0   0   ... 10375   20403   11559   18895   18410   12754   21527   11603   16619   37679

谢谢

Answer 1

您可以使用pandas isin() 首先添加一个名为＆＃39; sampletype＆＃39;的额外列。并填写“＆＃39; d。然后，找到所有具有健康状态的样本，并用“＃”填充它们。假设您的主数据框名为df，那么您将使用类似：

的内容

healthy = ['39','41','49','50','51','52','53','54','56']
df['sampletype'] = 'd'
df['sampletype'][df['Geneid'].isin(healthy)]='h'

Answer 2

如果integers为列，我认为您可以http://www.baeldung.com/registration-verify-user-by-email与http://devcrumb.com/hibernate/spring-data-jpa-hibernate-maven一起使用。

通过评论编辑：

列Geneid中可以有string，因此您可以numpy.where转换为healthy=['39','41','49','50','51','52','53','54','56'] df['type'] = np.where(df['Geneid'].astype(str).isin(healthy), 'h', 'd') #get last column to list print df.columns[-1].split() ['type'] #create new list from last column and all columns without last cols = df.columns[-1].split() + df.columns[:-1].tolist() print cols ['type', 'Geneid', 'PRKCZ.exon1', 'PRKCZ.exon2', 'PRKCZ.exon3', 'PRKCZ.exon4', 'PRKCZ.exon5', 'PRKCZ.exon6', 'PRKCZ.exon7', 'PRKCZ.exon8', 'PRKCZ.exon9', 'PRKCZ.exon10', 'FLNA.exon31', 'FLNA.exon32', 'FLNA.exon33', 'FLNA.exon34', 'FLNA.exon35', 'FLNA.exon36', 'FLNA.exon37', 'FLNA.exon38', 'MTCP1.exon1', 'MTCP1.exon2']。

#reorder columns
print df[cols]
  type Geneid  PRKCZ.exon1  PRKCZ.exon2  PRKCZ.exon3  PRKCZ.exon4  \
0    d    S28           22          127          135           77   
1    d     22            3          630          178          259   
2    d    S04           16          658          320          337   
3    h     56           26          663          343          390   
4    d    S27           13         1525          571         1081   

   PRKCZ.exon5  PRKCZ.exon6  PRKCZ.exon7  PRKCZ.exon8     ...       \
0          120          159           49           38     ...        
1          142          640           77          121     ...        
2          315          881          188          162     ...        
3          314         1090          263          200     ...        
4          560         1867          427          370     ...        

   FLNA.exon31  FLNA.exon32  FLNA.exon33  FLNA.exon34  FLNA.exon35  \
0          112          104           37           83           47   
1          636          288          281          538          276   
2         1291          420          369          859          507   
3          675          243          250          472          280   
4         1817          926          551         1554          808   

   FLNA.exon36  FLNA.exon37  FLNA.exon38  MTCP1.exon1  MTCP1.exon2  
0           18          110           70          167           19  
1          109          242          314          790          484  
2          208          554          408         1172          706  
3          133          300          275          750          473  
4          224          971         1313         1293          701  

[5 rows x 22 columns]

name1     name2
John Doe  John Doe
AleX T    Franz K

从数据框重命名索引的子集

2 个答案: