我在Excel中有一个表格,它在表格和参数的头部保存数字。 它看起来像这样,我只需要使用从A到E的单元格(并忽略所有其他单元格)。如您所见,F在带有标题的行中,但我需要选择要迭代的特定单元格(如上所述)。
A B C D E F
1 50 30 10 5 1 String
2 Oval, Round NaN Irregular Nan Nan String2
3 Circumscribed NaN Nan Nan Obscured, Microlobulated
4 High density NaN Equal Nan Fat-containing
我需要分别为cols头创建两个数组。例如,如果它的第二行我需要输出为两个数组:
prob_arr = [50, 50, 10]
val_arr = ['Oval', 'Round', 'Irregular']
第三行应该是:
prob_arr = [50, 1, 1]
val_arr = ['Circumscribed', 'Obscured', 'Microlobulated']
现在我有这个功能:
def concatvals(row, col, width, start, stop):
prob_head = list(df)[start:stop]
for i in range(width):
value_temp = df.iloc[row, col]
if isinstance(value_temp, float) is False:
value = [x.strip() for x in value_temp.split(',')]
len_val = len(value)
prob_arr = [prob_head[i] for _ in range(len_val)]
val_arr = [value[x] for x in range(len_val)]
col += 1
randparameter = random.choices(val_arr, prob_arr, k=1)
return randparameter
它只是没有正确创建arrs。有什么建议?
答案 0 :(得分:1)
import pandas as pd
def concatvals(df, row_idx, col_start_idx, col_end_idx):
"""
Input parameter `df` is table data as `pd.DataFrame`.
Input parameter `row_idx` is index of requested dataframe row as `int`.
Input parameter `col_start_idx` is index of first requested column as `int`.
Input parameter `col_end_idx` is index of last requested column as `int`.
"""
# Initialize return variables as empty lists
prob_arr = []
val_arr = []
# Extract slice from a single dataframe row as Series object
row = df.iloc[row_idx, col_start_idx: col_end_idx + 1]
# Iterate through all header-value pairs of the row Series
for header, value in row.iteritems():
# If value is a string
if isinstance(value, str):
# Split string value upon commas
subs = [x.strip() for x in value.split(',')]
# Append current header to return list
# (as many times as there are strings in `subs`)
prob_arr += len(subs) * [header]
# Append comma-delimited strings to return list
val_arr += subs
return prob_arr, val_arr
if __name__ == '__main__':
# Read excel worksheet into dataframe
df = pd.read_excel('test.xlsx')
# Convert first row (which has row index 0)
prob_arr1, val_arr1 = concatvals(df, row_idx=0, col_start_idx=0, col_end_idx=4)
print(prob_arr1)
print(val_arr1)
# Convert second row (which has row index 1)
prob_arr2, val_arr2 = concatvals(df, row_idx=1, col_start_idx=0, col_end_idx=4)
print(prob_arr2)
print(val_arr2)
给出输出:
[50, 50, 10]
['Oval', 'Round', 'Irregular']
[50, 1, 1]
['Circumscribed', 'Obscured', 'Microlobulated']