Question

我的CSV文件中有一个列，我想在其中搜索字符串列表并添加一个新的0/1列，如果列表中有任何值，则为1，否则为0。

我有两个清单：

'UC''iCD'，'Chrons disease'，'Chrons'，'IBD'，'溃疡结肠炎'，'PMC'，'P80'，'Chron疾病'
捐赠者，健康，非IBD，对照。

我的专栏也有NA值

到目前为止，我只是在尝试匹配叮咬列表：

import csv
import pandas as pd

with open('biosample.csv') as csvfile:
    df = pd.read_csv('biosample.csv', delimiter = ',', dtype= 'unicode', 
    error_bad_lines=False)
    df1 = df.set_index(['Sample_Info'])
print(df1.loc['UC''iCD', 'Chrons disease', 'Chrons', 'IBD', 'Ulcerative 
colitis', 'PMC', 'P80', 'Chron disease])

为此，我在has_valid_type_error中收到多个错误，例如_has_valid_type_error。

我已经完成了已经发布的问题，但没有提到这种错误。

Answer 1

演示：

In [84]: df
Out[84]:
   a   b    c    new
0  1  11  aaa   True
1  2  22  bbb  False
2  3  33  ccc   True
3  4  44  ddd  False

In [85]: lst = ['aaa','ccc','xxx']

In [86]: df['new'] = df['c'].isin(lst).astype(np.int8)

In [87]: df
Out[87]:
   a   b    c  new
0  1  11  aaa    1
1  2  22  bbb    0
2  3  33  ccc    1
3  4  44  ddd    0

PS根本不需要使用CSV模块：

df = pd.read_csv(r'/path/to/biosample.csv', delimiter = ',', 
                 encoding='unicode', error_bad_lines=False, 
                 index_col='Sample_Info')

Answer 2

从csv文件加载数据帧时，您不需要使用csv模块。

正如您所提到的，应将新列添加到数据框中。

用于检查第一个列表中的值的代码可能如下所示：

import pandas as pd

list1 = ['UC''iCD', 'Chrons disease', 'Chrons', 'IBD', 'Ulcerative colitis', 'PMC', 'P80', 'Chron disease']
list2 = ['Donor', 'healthy', 'non-IBD', 'Control']

def check_list(value, list2check):
    if any(map(lambda x: x in value, list2check))
        return 1
    return 0

df = pd.read_csv('biosample.csv', delimiter = ',', dtype= 'unicode', error_bad_lines=False)
df['sample_from_list1'] = df['Sample_Info'].apply(lambda v: check_list(v, list1))

我想在python数据帧中搜索stings列表

2 个答案: