我有一个excel文档,看起来像这样..
cluster load_date budget actual fixed_price
A 1/1/2014 1000 4000 Y
A 2/1/2014 12000 10000 Y
A 3/1/2014 36000 2000 Y
B 4/1/2014 15000 10000 N
B 4/1/2014 12000 11500 N
B 4/1/2014 90000 11000 N
C 7/1/2014 22000 18000 N
C 8/1/2014 30000 28960 N
C 9/1/2014 53000 51200 N
我希望能够将第1列的内容 - 群集作为列表返回,因此我可以在其上运行for循环,并为每个群集创建一个excel工作表。
是否也可以将整行的内容返回到列表中? e.g。
list = [], list[column1] or list[df.ix(row1)]
答案 0 :(得分:312)
Pandas DataFrame列是你拉出它们时的Pandas系列,然后你可以调用x.tolist()
将它们变成Python列表。或者,您可以使用list(x)
投射它。
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print("Starting with this dataframe\n", df)
print("The first column is a", type(df['one']), "\nconsisting of\n", df['one'])
dfToList = df['one'].tolist()
dfList = list(df['one'])
dfValues = df['one'].values
print("dfToList is", dfToList, "and it's a", type(dfToList))
print("dfList is ", dfList, "and it's a", type(dfList))
print("dfValues is", dfValues, "and it's a", type(dfValues))
最后一行返回:
dfToList is [1.0, 2.0, 3.0, nan] and it's a <class 'list'>
dfList is [1.0, 2.0, 3.0, nan] and it's a <class 'list'>
dfValues is [ 1. 2. 3. nan] and it's a <class 'numpy.ndarray'>
This question可能会有所帮助。一旦你了解他们的风格,Pandas docs实际上是相当不错的。
所以在你的情况下你可以:
my_list = df["cluster"].tolist()
然后从那里开始。
答案 1 :(得分:30)
这将返回一个numpy数组:
my_list = df["cluster"].values
这将为唯一值返回一个numpy数组:
my_list = df["cluster"].values
uniqueVals = np.unique(my_list)
或者:
uniqueVals = df["cluster"].unique()
答案 2 :(得分:3)
如果您的列只有一个值,则类似pd.series.tolist()
的错误。为了确保它适用于所有情况,请使用以下代码:
(
df
.filter(['column_name'])
.values
.reshape(1, -1)
.ravel()
.tolist()
)
答案 3 :(得分:1)
Numpy数组->熊猫数据框->熊猫列中的列表
Numpy数组
data = np.array([[10,20,30], [20,30,60], [30,60,90]])
将numpy数组转换为熊猫框架
data = np.array([[10,20,30], [20,30,60], [30,60,90]])
dataPd = pd.DataFrame(data = data)
print(dataPd)
0 1 2
0 10 20 30
1 20 30 60
2 30 60 90
pdToList = list(dataPd['2'])
遍历列表作为证明
for counter, value in enumerate(pdToList):
print(counter, value)
0 90
1 60
2 30
答案 4 :(得分:0)
还有另一个example.combine,其中包含来自网络的一些引用:
import pandas as pd
def readcolumn(filename,column):
#select sheet name and selct column as index,index_col=0
df = pd.read_excel(filename,sheetname =0)
headername = list(df)
print(headername)
column_data =df[list(df)[column]].tolist()
return column_data
答案 5 :(得分:0)
假设在读取Excel工作表后数据框的名称为df
,获取一个空列表(例如dataList
),逐行遍历数据框,然后像-< / p>
dataList = [] #empty list
for index, row in df.iterrows():
mylist = [row.cluster, row.load_date, row.budget, row.actual, row.fixed_price]
dataList.append(mylist)
或者,
dataList = [] #empty list
for row in df.itertuples():
mylist = [row.cluster, row.load_date, row.budget, row.actual, row.fixed_price]
dataList.append(mylist)
否,如果您打印dataList
,则会在dataList
中获得每一行的列表。
答案 6 :(得分:0)
由于这个问题引起了广泛关注,并且有多种方法可以完成您的任务,所以让我提出几个选择。
顺便说一句,这些都是一线的;)
开始于:
ser_aggCol (collapse each column to a list)
cluster [A, A, A, B, B, B, C, C, C]
load_date [1/1/2014, 2/1/2014, 3/1/2...
budget [1000, 12000, 36000, 15000...
actual [4000, 10000, 2000, 10000,...
fixed_price [Y, Y, Y, N, N, N, N, N, N]
dtype: object
ser_aggRows (collapse each row to a list)
0 [A, 1/1/2014, 1000, 4000, Y]
1 [A, 2/1/2014, 12000, 10000...
2 [A, 3/1/2014, 36000, 2000, Y]
3 [B, 4/1/2014, 15000, 10000...
4 [B, 4/1/2014, 12000, 11500...
5 [B, 4/1/2014, 90000, 11000...
6 [C, 7/1/2014, 22000, 18000...
7 [C, 8/1/2014, 30000, 28960...
8 [C, 9/1/2014, 53000, 51200...
dtype: object
df_gr (here you get lists for each cluster)
load_date budget actual fixed_price
cluster
A [1/1/2014, 2/1/2014, 3/1/2... [1000, 12000, 36000] [4000, 10000, 2000] [Y, Y, Y]
B [4/1/2014, 4/1/2014, 4/1/2... [15000, 12000, 90000] [10000, 11500, 11000] [N, N, N]
C [7/1/2014, 8/1/2014, 9/1/2... [22000, 30000, 53000] [18000, 28960, 51200] [N, N, N]
a list of separate dataframes for each cluster
df for cluster A
cluster load_date budget actual fixed_price
0 A 1/1/2014 1000 4000 Y
1 A 2/1/2014 12000 10000 Y
2 A 3/1/2014 36000 2000 Y
df for cluster B
cluster load_date budget actual fixed_price
3 B 4/1/2014 15000 10000 N
4 B 4/1/2014 12000 11500 N
5 B 4/1/2014 90000 11000 N
df for cluster C
cluster load_date budget actual fixed_price
6 C 7/1/2014 22000 18000 N
7 C 8/1/2014 30000 28960 N
8 C 9/1/2014 53000 51200 N
just the values of column load_date
0 1/1/2014
1 2/1/2014
2 3/1/2014
3 4/1/2014
4 4/1/2014
5 4/1/2014
6 7/1/2014
7 8/1/2014
8 9/1/2014
Name: load_date, dtype: object
just the values of column number 2
0 1000
1 12000
2 36000
3 15000
4 12000
5 90000
6 22000
7 30000
8 53000
Name: budget, dtype: object
just the values of row number 7
cluster C
load_date 8/1/2014
budget 30000
actual 28960
fixed_price N
Name: 7, dtype: object
============================== JUST FOR COMPLETENESS ==============================
you can convert a series to a list
['C', '8/1/2014', '30000', '28960', 'N']
<class 'list'>
you can convert a dataframe to a nested list
[['A', '1/1/2014', '1000', '4000', 'Y'], ['A', '2/1/2014', '12000', '10000', 'Y'], ['A', '3/1/2014', '36000', '2000', 'Y'], ['B', '4/1/2014', '15000', '10000', 'N'], ['B', '4/1/2014', '12000', '11500', 'N'], ['B', '4/1/2014', '90000', '11000', 'N'], ['C', '7/1/2014', '22000', '18000', 'N'], ['C', '8/1/2014', '30000', '28960', 'N'], ['C', '9/1/2014', '53000', '51200', 'N']]
<class 'list'>
the content of a dataframe can be accessed as a numpy.ndarray
[['A' '1/1/2014' '1000' '4000' 'Y']
['A' '2/1/2014' '12000' '10000' 'Y']
['A' '3/1/2014' '36000' '2000' 'Y']
['B' '4/1/2014' '15000' '10000' 'N']
['B' '4/1/2014' '12000' '11500' 'N']
['B' '4/1/2014' '90000' '11000' 'N']
['C' '7/1/2014' '22000' '18000' 'N']
['C' '8/1/2014' '30000' '28960' 'N']
['C' '9/1/2014' '53000' '51200' 'N']]
<class 'numpy.ndarray'>
潜在操作概述:
# prefix ser refers to pd.Series object
# prefix df refers to pd.DataFrame object
# prefix lst refers to list object
import pandas as pd
import numpy as np
df=pd.DataFrame([
['A', '1/1/2014', '1000', '4000', 'Y'],
['A', '2/1/2014', '12000', '10000', 'Y'],
['A', '3/1/2014', '36000', '2000', 'Y'],
['B', '4/1/2014', '15000', '10000', 'N'],
['B', '4/1/2014', '12000', '11500', 'N'],
['B', '4/1/2014', '90000', '11000', 'N'],
['C', '7/1/2014', '22000', '18000', 'N'],
['C', '8/1/2014', '30000', '28960', 'N'],
['C', '9/1/2014', '53000', '51200', 'N']
], columns=['cluster', 'load_date', 'budget', 'actual', 'fixed_price'])
print('df',df, sep='\n', end='\n\n')
ser_aggCol=df.aggregate(lambda x: [x.tolist()], axis=0).map(lambda x:x[0])
print('ser_aggCol (collapse each column to a list)',ser_aggCol, sep='\n', end='\n\n\n')
ser_aggRows=pd.Series(df.values.tolist())
print('ser_aggRows (collapse each row to a list)',ser_aggRows, sep='\n', end='\n\n\n')
df_gr=df.groupby('cluster').agg(lambda x: list(x))
print('df_gr (here you get lists for each cluster)',df_gr, sep='\n', end='\n\n\n')
lst_dfFiltGr=[ df.loc[df['cluster']==val,:] for val in df['cluster'].unique() ]
print('a list of separate dataframes for each cluster', sep='\n', end='\n\n')
for dfTmp in lst_dfFiltGr:
print('df for cluster '+str(dfTmp.loc[dfTmp.index[0],'cluster']),dfTmp, sep='\n', end='\n\n')
ser_singleColLD=df.loc[:,'load_date']
print('just the values of column load_date',ser_singleColLD, sep='\n', end='\n\n\n')
ser_singleCol2=df.iloc[:,2]
print('just the values of column number 2',ser_singleCol2, sep='\n', end='\n\n\n')
ser_singleRow7=df.iloc[7,:]
print('just the values of row number 7',ser_singleRow7, sep='\n', end='\n\n\n')
print('='*30+' JUST FOR COMPLETENESS '+'='*30, end='\n\n\n')
lst_fromSer=ser_singleRow7.tolist()
print('you can convert a series to a list',lst_fromSer, type(lst_fromSer), sep='\n', end='\n\n\n')
lst_fromDf=df.values.tolist()
print('you can convert a dataframe to a nested list',lst_fromDf, type(lst_fromDf), sep='\n', end='\n\n')
arr_fromDf=df.values
print('the content of a dataframe can be accessed as a numpy.ndarray',arr_fromDf, type(arr_fromDf), sep='\n', end='\n\n')
代码:
.values
正如cs95所指出的,应优先于see here上的pandas版本0.24的pandas print(pd.__version__)
属性。我在这里使用它,因为大多数人(到2019年)仍将具有较旧的版本,该版本不支持新的建议。您可以使用 <div class="input-group-prepend">
<span class="input-group-text">With textarea</span>
</div>
<div class="row">
<div class="col">
<textarea class="form-control" aria-label="With textarea"></textarea>
</div>
<div class="col">
<div class="input-group-append button-group" id="button-addon4">
<button class="btn btn-outline-secondary" type="button">Button</button>
<button class="btn btn-outline-secondary" type="button">Button</button>
</div>
</div>
</div>
</div>
答案 7 :(得分:0)
amount = list()
for col in df.columns:
val = list(df[col])
for v in val:
amount.append(v)