我有一个由reviews
列组成的数据帧,该列是multi dimensional array
,我想提取第一个元素,如下所示,
假设df['Reviews']
由以下rows
我希望将输出显示在单独的列中,如下所示,
请在下面的列中找到示例数据3个值:
df ['评论'] = [['就像家一样','热烈欢迎来到阿姆斯特丹寒冬'],['01/03/2018','01/01/2018']] [['美味的食物和员工','完美的'],['01 / 06/2018','01 / 04/2018']] [['满意度','美味的老派餐厅'],['01 / 04/2018','01 / 04/2018']]
请帮助
答案 0 :(得分:0)
您需要根据需要添加以下内容以访问数据框。 这将创建一个具有相应要求的名为 output 的新列
应用功能
"dependencies": {
"@babel/runtime": "^7.4.4",
"@bbc/nightwatch-commands": "^1.1.0",
"cookie-parser": "~1.4.3",
"debug": "~2.6.9",
"easyimage": "^3.1.0",
"enums": "^1.0.1",
"es-module-loader": "^2.3.0",
"es6-module-loader": "^0.17.11",
"express": "~4.16.0",
"handlebars": "^4.1.2",
"http-errors": "~1.6.2",
"jest-image-snapshot": "^2.8.1",
"lint": "^1.1.2",
"module-alias": "^2.2.0",
"moment": "^2.22.2",
"morgan": "~1.9.0",
"nightwatch": "^0.9.21",
"node-resemble-js": "^0.2.0",
"pug": "^2.0.3",
"puppeteer-cluster": "^0.16.0",
"puppeteer-select": "^1.0.3",
"simple-node-logger": "^18.12.23",
"traceur": "0.0.111"
},
"license": "ISC",
"devDependencies": {
"@babel/cli": "^7.4.4",
"@babel/core": "^7.4.4",
"@babel/plugin-transform-runtime": "^7.4.4",
"@babel/preset-env": "^7.4.4",
"babel-core": "^6.26.3",
"babel-eslint": "^10.0.1",
"babel-plugin-add-module-exports": "^1.0.0",
"babel-plugin-transform-runtime": "^6.23.0",
"babel-preset-env": "^1.7.0",
"babel-preset-es2015": "^6.24.1",
"eslint": "^5.8.0",
"eslint-config-canonical": "^19.0.4",
"eslint-config-standard": "^14.1.1",
"eslint-plugin-import": "^2.20.2",
"eslint-plugin-jsx-a11y": "^6.2.3",
"eslint-plugin-node": "^11.1.0",
"eslint-plugin-promise": "^4.2.1",
"eslint-plugin-standard": "^4.0.1",
"eslint-plugin-jest": "^23.8.2",
"jest": "^24.8.0",
"jest-html-reporters": "^1.1.8",
"jest-puppeteer": "^4.0.0",
"jquery": "^3.4.1",
"puppeteer": "^1.16.0",
"puppeteer-page-object": "^2.1.0"
}
地图功能
df['output'] = df.Reviews.apply(lambda x: x[0])
答案 1 :(得分:0)
我想这应该有所帮助。这对我有用。
df['Reviews']=df['Reviews'].apply(lambda c: str(c[0]).strip('[]'))
如果运行一次,效果很好。如果再次在相同的代码上运行,它将进一步分割文本。因此,我建议在使用后将其注释掉。 或创建一个新列。
P.S:您应该包括代码而不是屏幕截图,以便可以首先对其进行测试。
答案 2 :(得分:0)
如果需要优先列表,请使用str[0]
的索引:
import ast
df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0]
如果需要按,
的连接列表到字符串,请添加Series.str.join
:
import ast
df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0].str.join(',')
答案 3 :(得分:0)
如果收到错误消息,则“评论”中可能有一些空白数据。如果这些数据对您无用,则可以删除它们:
df.dropna(subset='Reviews', inplace=True)
或添加数据的检查类型:
a = [[['Just like home', 'A Warm Welcome to Wintry Amsterdam'], ['01/03/2018', '01/01/2018']], [['Great food and staff', 'just perfect'], ['01/06/2018', '01/04/2018']], [['Satisfaction', 'Delicious old school restaurant'], ['01/04/2018', '01/04/2018']]]
df = pd.DataFrame(columns=['Reviews', 'Review'])
df['Reviews'] = a
df
executed in 18ms, finished 07:39:04 2020-06-05
Reviews Review
0 [[Just like home, A Warm Welcome to Wintry Ams... NaN
1 [[Great food and staff, just perfect], [01/06/... NaN
2 [[Satisfaction, Delicious old school restauran... NaN
def get_review(reviews):
if type(reviews) == list:
return reviews[0]
else:
return None
df['Review'] = df['Reviews'].apply(get_review)
df
Reviews Review
0 [[Just like home, A Warm Welcome to Wintry Ams... [Just like home, A Warm Welcome to Wintry Amst...
1 [[Great food and staff, just perfect], [01/06/... [Great food and staff, just perfect]
2 [[Satisfaction, Delicious old school restauran... [Satisfaction, Delicious old school restaurant]
如果您不希望将列Review
列为列表,只需将其隐式转换为带分隔符的字符串即可:
def get_review(reviews):
if type(reviews) == list:
return ', '.join(reviews[0])
else:
return ''
df['Review'] = df['Reviews'].apply(get_review)
df
Reviews Review
0 [[Just like home, A Warm Welcome to Wintry Ams... Just like home, A Warm Welcome to Wintry Amste...
1 [[Great food and staff, just perfect], [01/06/... Great food and staff, just perfect
2 [[Satisfaction, Delicious old school restauran... Satisfaction, Delicious old school restaurant
如果您输入的数据不是列表类型(即您从CSV读取),则需要先将其转换为列表:
import ast
def get_review(reviews):
if pd.notna(reviews) and reviews != '':
r_list = ast.literal_eval(reviews)[0]
if len(r_list) > 0:
return ', '.join(r_list)
else:
return ''
else:
return ''
df2['Review'] = df2['Reviews'].apply(get_review)
df2
Reviews Review
Reviews Review
0 [['Just like home', 'A Warm Welcome to Wintry ... Just like home, A Warm Welcome to Wintry Amste...
1 [['Great food and staff', 'just perfect'], ['0... Great food and staff, just perfect
2 [['Satisfaction', 'Delicious old school restau... Satisfaction, Delicious old school restaurant