将字典列转换为数据框

时间:2020-05-27 08:46:33

标签: python python-3.x

我有一个csv文件,其中的一列具有我感兴趣的结果,而另一个具有索引:

,Province, Constituency Name, Party Affiliation, segments
0,Ben Slimane, Ain Tizgha, UND, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
1,Ben Slimane, Ain Tizgha, ABS, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
2,Ben Slimane, Ain Tizgha, PJD, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
3,Ben Slimane, Ahlaf, UND, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
4,Ben Slimane, Ahlaf, ABS, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
5,Ben Slimane, Ahlaf, PJD, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
6,Khouribga,Ain Kaicher,UND, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
7,Khouribga,Ain Kaicher,ABS, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
8,Khouribga,Ain Kaicher,PJD, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
9, Khouribga,Bni Bataou,UND, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
10, Khouribga,Bni Bataou,ABS, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
11, Khouribga,Bni Bataou,PJD, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"

确实,有重复项。我希望我有这样的东西:

Constituency,UND, ABS, PJD
Ain Tizgha,20.83,8.33,75
Ahlaf,20.83,8.33,75
Ain Kaicher,18.31, 1.19, 21.05
Bni Bataou,18.31, 1.19, 21.05

数字将是segments列字典中每个元素的tent_rate。

如何将字典列转换为数据框?

目前我尝试过:

>>> for row in df.iterrows():
...     preceding_row = row
...     if row['segments'] == preceding_row:
...         break
...     saved_things = [row['Constituency'],row['segments']]
...

我知道那些""I don't know yet""可能是个问题。

更新

我试图调整林宾加林的答案以使其具有活力,而不依赖于党派名称:

def parse_segment(row):
    segment = row['segments']
    segment = ast.literal_eval(segment)
    results = []
    for party in df['Party Affiliation'].unique():
        if party in segment.keys():
            v_i = segment[party]['intention_rate']
            results.append(v_i)
        else:
            v_i = 0
    return results


if __name__ == '__main__':
    # main()
    # Load data
    df = pd.read_csv('constituencies_with_segments.csv', header=0, index_col=0)
    parties = [party for party in df['Party Affiliation'].unique()]
    df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)

    df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
    df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)
    print(df.head())

但是,出现以下错误:

(campaign_manager) C:\Users\antoi\Documents\Programming\electoral-prediction-model-pk\data\Morocco>python3 geojson_file_updater.py
Traceback (most recent call last):
  File "geojson_file_updater.py", line 75, in <module>
    df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
  File "C:\Users\antoi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\core\frame.py", line 2935, in __setitem__
    self._setitem_array(key, value)
  File "C:\Users\antoi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\core\frame.py", line 2961, in _setitem_array
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

1 个答案:

答案 0 :(得分:1)

前两行是否有错别字? ABS是“ ABS”,PJD是“ PJD”吗?

如果确实有错别字,您可以尝试(排除前两行)

import ast


def parse_segment(row):
    segment = row['segments']
    segment = ast.literal_eval(segment)
    v_1 = segment['UND']['intention_rate']
    v_2 = segment['ABS']['intention_rate']
    v_3 = segment['PJD']['intention_rate']
    return [v_1, v_2, v_3]


# Load data
df = pd.read_csv('your_cvs_file.csv', header=0, index_col=0)
df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)

df[['UND', 'ABS', 'PJD']] = df.apply(parse_segment, axis=1, result_type='expand')
df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)

更新(取决于上面发布的数据)

# Load data
df = pd.read_csv('your_cvs_file.csv', header=0, index_col=0)
parties = list(set(df['Party Affiliation']))  # must before duplications dropped
df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)


def parse_segment(row):
    segment = row['segments']
    segment = ast.literal_eval(segment)

    return [segment[party].get('intention_rate', 0)
            for party in parties if party in segment]


df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)

希望这会对您有所帮助。