Question

我有一个类似下面的csv文件

h1,h2,h3
1 year,homo sapiens,fibrous tissue
3 minutes,homo sapiens,fibrous tissue
2 hours,homo sapiens,epithelial tissue

我正试图让那个列中包含我提供的字符串。例如，如果我说年份，则需要将整个列附加到[1年，3分钟，2小时]等列表中。我完全迷失了如何继续前进。我真的很感激任何帮助。

编辑：问题是，数据可以在任何列中。

Answer 1

我们可以使用列表理解以及any和str.contains的组合：

In [183]:
# filter the columns for only those that contain our text of interest
cols_of_interest = [col for col in df if any(df[col].str.contains('year'))]
cols_of_interest
Out[183]:
['h1']
In [184]:
# use the list as a column filter
df[cols_of_interest]
Out[184]:
          h1
0     1 year
1  3 minutes
2    2 hours

因此，通过调用向量化any方法contains来测试列中的str值是否包含感兴趣的文本。

将列表推导包装到返回列表的函数中会很容易：

In [185]:

def cols_contains(text):
    return [col for col in df if any(df[col].str.contains(text))]

df[cols_contains('year')]
Out[185]:
          h1
0     1 year
1  3 minutes
2    2 hours

Answer 2

试试这个

f=open('your_file.csv','r')

x=[]
for i in f:
    x.append(i)


"first column"

for i in range(len(x)):
    print x[i].split(',')[0]

输出 H1

1年

3分钟

2小时

"Second Column"


for i in range(len(x)):
    print x[i].split(',')[1]

输出：

H2

homo sapiens

根据CSV python的内容提取列

2 个答案: