Question

我在Excel中有一个表格，它在表格和参数的头部保存数字。它看起来像这样，我只需要使用从A到E的单元格（并忽略所有其他单元格）。如您所见，F在带有标题的行中，但我需要选择要迭代的特定单元格（如上所述）。

    A              B    C           D               E           F
1   50             30   10          5               1           String
2   Oval, Round    NaN  Irregular   Nan             Nan         String2
3   Circumscribed  NaN  Nan         Nan             Obscured, Microlobulated
4   High density   NaN  Equal       Nan             Fat-containing

我需要分别为cols头创建两个数组。例如，如果它的第二行我需要输出为两个数组：

prob_arr = [50, 50, 10]
val_arr = ['Oval', 'Round', 'Irregular']

第三行应该是：

prob_arr = [50, 1, 1]
val_arr = ['Circumscribed', 'Obscured', 'Microlobulated']

现在我有这个功能：

def concatvals(row, col, width, start, stop):
    prob_head = list(df)[start:stop]
    for i in range(width):
        value_temp = df.iloc[row, col]
        if isinstance(value_temp, float) is False:
            value = [x.strip() for x in value_temp.split(',')]
            len_val = len(value)
            prob_arr = [prob_head[i] for _ in range(len_val)]
            val_arr =  [value[x] for x in range(len_val)]
        col += 1

    randparameter = random.choices(val_arr, prob_arr, k=1)
    return randparameter

它只是没有正确创建arrs。有什么建议？

Answer 1

import pandas as pd


def concatvals(df, row_idx, col_start_idx, col_end_idx):
    """
    Input parameter `df` is table data as `pd.DataFrame`.
    Input parameter `row_idx` is index of requested dataframe row as `int`.
    Input parameter `col_start_idx` is index of first requested column as `int`.
    Input parameter `col_end_idx` is index of last requested column as `int`.
    """

    # Initialize return variables as empty lists
    prob_arr = []
    val_arr = []

    # Extract slice from a single dataframe row as Series object
    row = df.iloc[row_idx, col_start_idx: col_end_idx + 1]

    # Iterate through all header-value pairs of the row Series
    for header, value in row.iteritems():
        # If value is a string
        if isinstance(value, str):
            # Split string value upon commas
            subs = [x.strip() for x in value.split(',')]

            # Append current header to return list
            # (as many times as there are strings in `subs`)
            prob_arr += len(subs) * [header]

            # Append comma-delimited strings to return list
            val_arr += subs

    return prob_arr, val_arr


if __name__ == '__main__':
    # Read excel worksheet into dataframe
    df = pd.read_excel('test.xlsx')

    # Convert first row (which has row index 0)
    prob_arr1, val_arr1 = concatvals(df, row_idx=0, col_start_idx=0, col_end_idx=4)
    print(prob_arr1)
    print(val_arr1)

    # Convert second row (which has row index 1)
    prob_arr2, val_arr2 = concatvals(df, row_idx=1, col_start_idx=0, col_end_idx=4)
    print(prob_arr2)
    print(val_arr2)

给出输出：

[50, 50, 10]
['Oval', 'Round', 'Irregular']
[50, 1, 1]
['Circumscribed', 'Obscured', 'Microlobulated']

分别为pandas中的列标题创建一个参数数组

1 个答案: