我有一个csv文件,其中的一列具有我感兴趣的结果,而另一个具有索引:
,Province, Constituency Name, Party Affiliation, segments
0,Ben Slimane, Ain Tizgha, UND, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
1,Ben Slimane, Ain Tizgha, ABS, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
2,Ben Slimane, Ain Tizgha, PJD, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
3,Ben Slimane, Ahlaf, UND, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
4,Ben Slimane, Ahlaf, ABS, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
5,Ben Slimane, Ahlaf, PJD, "{'UND': {""I don't know yet"": 16, 'No': 3, 'Yes': 5, 'total': 24, 'intention_rate': 20.83}, 'ABS': {""I don't know yet"": 1, 'No': 10, 'Yes': 1, 'total': 12, 'intention_rate': 8.33}, 'PJD': {""I don't know yet"": 1, 'Yes': 3, 'total': 4, 'intention_rate': 75}}"
6,Khouribga,Ain Kaicher,UND, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
7,Khouribga,Ain Kaicher,ABS, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
8,Khouribga,Ain Kaicher,PJD, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
9, Khouribga,Bni Bataou,UND, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
10, Khouribga,Bni Bataou,ABS, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
11, Khouribga,Bni Bataou,PJD, "{'UND': {""I don't know yet"": 46, 'No': 12, 'Yes': 13, 'total': 71, 'intention_rate': 18.31}, 'ABS': {""I don't know yet"": 4, 'No': 79, 'Yes': 1, 'total': 84, 'intention_rate': 1.19}, 'PJD': {""I don't know yet"": 14, 'No': 1, 'Yes': 4, 'total': 19, 'intention_rate': 21.05}}"
确实,有重复项。我希望我有这样的东西:
Constituency,UND, ABS, PJD
Ain Tizgha,20.83,8.33,75
Ahlaf,20.83,8.33,75
Ain Kaicher,18.31, 1.19, 21.05
Bni Bataou,18.31, 1.19, 21.05
数字将是segments列字典中每个元素的tent_rate。
如何将字典列转换为数据框?
目前我尝试过:
>>> for row in df.iterrows():
... preceding_row = row
... if row['segments'] == preceding_row:
... break
... saved_things = [row['Constituency'],row['segments']]
...
我知道那些""I don't know yet""
可能是个问题。
我试图调整林宾加林的答案以使其具有活力,而不依赖于党派名称:
def parse_segment(row):
segment = row['segments']
segment = ast.literal_eval(segment)
results = []
for party in df['Party Affiliation'].unique():
if party in segment.keys():
v_i = segment[party]['intention_rate']
results.append(v_i)
else:
v_i = 0
return results
if __name__ == '__main__':
# main()
# Load data
df = pd.read_csv('constituencies_with_segments.csv', header=0, index_col=0)
parties = [party for party in df['Party Affiliation'].unique()]
df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)
df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)
print(df.head())
但是,出现以下错误:
(campaign_manager) C:\Users\antoi\Documents\Programming\electoral-prediction-model-pk\data\Morocco>python3 geojson_file_updater.py
Traceback (most recent call last):
File "geojson_file_updater.py", line 75, in <module>
df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
File "C:\Users\antoi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\core\frame.py", line 2935, in __setitem__
self._setitem_array(key, value)
File "C:\Users\antoi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\core\frame.py", line 2961, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
答案 0 :(得分:1)
前两行是否有错别字? ABS是“ ABS”,PJD是“ PJD”吗?
如果确实有错别字,您可以尝试(排除前两行)
import ast
def parse_segment(row):
segment = row['segments']
segment = ast.literal_eval(segment)
v_1 = segment['UND']['intention_rate']
v_2 = segment['ABS']['intention_rate']
v_3 = segment['PJD']['intention_rate']
return [v_1, v_2, v_3]
# Load data
df = pd.read_csv('your_cvs_file.csv', header=0, index_col=0)
df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)
df[['UND', 'ABS', 'PJD']] = df.apply(parse_segment, axis=1, result_type='expand')
df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)
更新(取决于上面发布的数据)
# Load data
df = pd.read_csv('your_cvs_file.csv', header=0, index_col=0)
parties = list(set(df['Party Affiliation'])) # must before duplications dropped
df.drop_duplicates(subset=['Constituency Name', 'segments'], inplace=True)
def parse_segment(row):
segment = row['segments']
segment = ast.literal_eval(segment)
return [segment[party].get('intention_rate', 0)
for party in parties if party in segment]
df[parties] = df.apply(parse_segment, axis=1, result_type='expand')
df.drop(columns=['Province', 'Party Affiliation', 'segments'], inplace=True)
希望这会对您有所帮助。