Question

我有问题，希望您能帮助我。我有一个.csv数据并具有3行。（“字符串”，数字，“字符串”）。此数据有500兆字节和10 000 000行，我还有一个包含1 500 000字符串的列表。这些字符串是数据帧第一行的一部分。我想获得一个包含1 500 000行（“字符串”，数字，“字符串”）的数据框。我阅读了有关向量化的不同文章，但我不是python专家。什么是执行此任务的最佳选择。

outfile = pd.read_csv (dataname, sep='\s+', Header = none )
outfile.columns = ['picturename', 'number', 'part']

outerdata = outfile['picturename'].values

for var in all_file_names # this is the list with 1 500 000 rows
  puffer = outfile.loc[outerdata == name]

Answer 1

使用Series.isin：

puffer = outfile.loc[outfile['picturename'].isin(name)]

示例：

print(df2)
    C   D   E
0   b   a   l
1   h  bb   h
2   r   f   b
3  cc   g   a
4   d   r  dd

name=['a','b','d']
df2.loc[df2['C'].isin(name)]

输出：

   C  D   E
0  b  a   l
4  d  r  dd

数据框中的Python搜索字符串

1 个答案: