Question

我正在尝试访问CSV文件的内容并进行解析。我只需要整个CSV文件中的两列。我可以访问CSV及其内容，但需要将其限制为所需的列，以便可以使用该列中的详细信息

import os
import boto3
import pandas as pd
import sys
from io import StringIO # Python 3.x
session = boto3.session.Session(profile_name="rli-prod",region_name="us-east-1")
client = session.client("s3")
bucket_name = 'bucketname'
object_key = 'XX/YY/ZZ.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8-sig')
df = pd.read_csv(StringIO(csv_string))
print(df)

现在，我正在获取整个CSV。下面是输出

0  63a2a854-a136-4bb1-a89b-a4e638b2be14  8128639b-a163-4e8e-b1f8-22e3dcd2b655  ...                123  63a2a854-a136-4bb1-a89b-a4e638b2be14
1  63a2a854-a136-4bb1-a89b-a4e638b2be14  8d6bdc73-f908-45d8-8d8a-c3ac0bee3b29  ...                123  63a2a854-a136-4bb1-a89b-a4e638b2be14
2  63a2a854-a136-4bb1-a89b-a4e638b2be14  1312e6f6-4c5f-4fa5-babd-93a3c0d3b502  ...                234  63a2a854-a136-4bb1-a89b-a4e638b2be14
3  63a2a854-a136-4bb1-a89b-a4e638b2be14  bfec5ccc-4449-401d-9898-9c523b1e1230  ...                456  63a2a854-a136-4bb1-a89b-a4e638b2be14
4  63a2a854-a136-4bb1-a89b-a4e638b2be14  522a72f0-2746-417c-9a59-fae4fb1e07d7  ...                567  63a2a854-a136-4bb1-a89b-a4e638b2be14

[5 rows x 9 columns]

现在，我的CSV没有任何标题，所以我唯一的选择就是使用列号进行抓取。但是不知道该怎么做？谁能帮忙吗？

Answer 1

选项1：

如果您已经阅读了csv并想在计算中删除其他列。在Frame.Navigated中使用要使用的列的索引。

示例：

df.iloc

选项2

在读取文件本身的过程中，指定要在>>> df #sample dataframe I want to get the first 2 columns only Artist Count Test 0 The Beatles 4 1 1 Some Artist 2 1 2 Some Artist 2 1 3 The Beatles 4 1 4 The Beatles 4 1 5 The Beatles 4 1 >>> df3 = df.iloc[:,[0,1]] >>> df3 Artist Count 0 The Beatles 4 1 Some Artist 2 2 Some Artist 2 3 The Beatles 4 4 The Beatles 4 5 The Beatles 4的参数usecols下使用哪些列。

read_csv()

Answer 2

强文本使用熊猫库中的read_csv方法：

import pandas as pd

data = pd.read_csv('file.csv', usecols=[2, 4])   
print(data.head())

参数 usecols 接受列或索引的名称作为列表

Answer 3

由于您已经在使用Pandas库，因此应该可以通过将andrew andrew andrew andrew参数传递给header=方法来实现此目的，如下所示：

read_csv

来自docs：...标头可以是整数列表，这些整数指定列中多索引的行位置，例如[0,1,3]。将跳过未指定的中间行（例如，在此示例中为2）...

Answer 4

In [15]: import pandas as pd

In [16]: d1 = {"col1" : "value11", "col2": "value21", "col3": "value31"}

In [17]: d2 = {"col1" : "value12", "col2": "value22", "col3": "value32"}

In [18]: d3 = {"col1" : "value13", "col2": "value23", "col3": "value33"}

In [19]: df = df.append(d1, ignore_index=True, verify_integrity=True, sort=False)

In [20]: df = df.append(d2, ignore_index=True, verify_integrity=True, sort=False)

In [21]: df = df.append(d3, ignore_index=True, verify_integrity=True, sort=False)

In [22]: df
Out[22]:
      col1     col2     col3
0  value11  value21  value31
1  value12  value22  value32
2  value13  value23  value33
3  value11  value21  value31
4  value12  value22  value32
5  value13  value23  value33

In [23]: # Selecting only col1 and col3

In [24]: df_new = df[["col1", "col3"]]

In [25]: df_new
Out[25]:
      col1     col3
0  value11  value31
1  value12  value32
2  value13  value33
3  value11  value31
4  value12  value32
5  value13  value33

In [26]:

无法从CSV文件中获取特定的列

4 个答案:

选项1：

选项2