从熊猫数据框中提取多维数组中的第一个元素

时间:2020-06-01 07:31:13

标签: python pandas

我有一个由reviews列组成的数据帧,该列是multi dimensional array,我想提取第一个元素,如下所示,

假设df['Reviews']由以下rows

组成

Reviews Data

我希望将输出显示在单独的列中,如下所示,

Output

请在下面的列中找到示例数据3个值:

df ['评论'] = [['就像家一样','热烈欢迎来到阿姆斯特丹寒冬'],['01/03/2018','01/01/2018']] [['美味的食物和员工','完美的'],['01 / 06/2018','01 / 04/2018']] [['满意度','美味的老派餐厅'],['01 / 04/2018','01 / 04/2018']]

请帮助

4 个答案:

答案 0 :(得分:0)

您需要根据需要添加以下内容以访问数据框。 这将创建一个具有相应要求的名为 output 的新列

应用功能

  "dependencies": {
    "@babel/runtime": "^7.4.4",
    "@bbc/nightwatch-commands": "^1.1.0",
    "cookie-parser": "~1.4.3",
    "debug": "~2.6.9",
    "easyimage": "^3.1.0",
    "enums": "^1.0.1",
    "es-module-loader": "^2.3.0",
    "es6-module-loader": "^0.17.11",
    "express": "~4.16.0",
    "handlebars": "^4.1.2",
    "http-errors": "~1.6.2",
    "jest-image-snapshot": "^2.8.1",
    "lint": "^1.1.2",
    "module-alias": "^2.2.0",
    "moment": "^2.22.2",
    "morgan": "~1.9.0",
    "nightwatch": "^0.9.21",
    "node-resemble-js": "^0.2.0",
    "pug": "^2.0.3",
    "puppeteer-cluster": "^0.16.0",
    "puppeteer-select": "^1.0.3",
    "simple-node-logger": "^18.12.23",
    "traceur": "0.0.111"
  },
  "license": "ISC",
  "devDependencies": {
    "@babel/cli": "^7.4.4",
    "@babel/core": "^7.4.4",
    "@babel/plugin-transform-runtime": "^7.4.4",
    "@babel/preset-env": "^7.4.4",
    "babel-core": "^6.26.3",
    "babel-eslint": "^10.0.1",
    "babel-plugin-add-module-exports": "^1.0.0",
    "babel-plugin-transform-runtime": "^6.23.0",
    "babel-preset-env": "^1.7.0",
    "babel-preset-es2015": "^6.24.1",
    "eslint": "^5.8.0",
    "eslint-config-canonical": "^19.0.4",
    "eslint-config-standard": "^14.1.1",
    "eslint-plugin-import": "^2.20.2",
    "eslint-plugin-jsx-a11y": "^6.2.3",
    "eslint-plugin-node": "^11.1.0",
    "eslint-plugin-promise": "^4.2.1",
    "eslint-plugin-standard": "^4.0.1",
    "eslint-plugin-jest": "^23.8.2",
    "jest": "^24.8.0",
    "jest-html-reporters": "^1.1.8",
    "jest-puppeteer": "^4.0.0",
    "jquery": "^3.4.1",
    "puppeteer": "^1.16.0",
    "puppeteer-page-object": "^2.1.0"
  }

地图功能

df['output'] = df.Reviews.apply(lambda x: x[0])

答案 1 :(得分:0)

我想这应该有所帮助。这对我有用。

df['Reviews']=df['Reviews'].apply(lambda c: str(c[0]).strip('[]'))

如果运行一次,效果很好。如果再次在相同的代码上运行,它将进一步分割文本。因此,我建议在使用后将其注释掉。 或创建一个新列。

P.S:您应该包括代码而不是屏幕截图,以便可以首先对其进行测试。

编辑 enter image description here 对我来说看起来不错。请再次尝试并记住是否运行两次(以防不创建单独的列),它将不会返回任何内容

答案 2 :(得分:0)

如果需要优先列表,请使用str[0]的索引:

import ast

df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0]

如果需要按,的连接列表到字符串,请添加Series.str.join

import ast

df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0].str.join(',')

答案 3 :(得分:0)

如果收到错误消息,则“评论”中可能有一些空白数据。如果这些数据对您无用,则可以删除它们: df.dropna(subset='Reviews', inplace=True)

或添加数据的检查类型:

a = [[['Just like home', 'A Warm Welcome to Wintry Amsterdam'], ['01/03/2018', '01/01/2018']], [['Great food and staff', 'just perfect'], ['01/06/2018', '01/04/2018']], [['Satisfaction', 'Delicious old school restaurant'], ['01/04/2018', '01/04/2018']]]

df = pd.DataFrame(columns=['Reviews', 'Review'])
df['Reviews'] = a
df
executed in 18ms, finished 07:39:04 2020-06-05
Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   NaN
1   [[Great food and staff, just perfect], [01/06/...   NaN
2   [[Satisfaction, Delicious old school restauran...   NaN

def get_review(reviews):
    if type(reviews) == list:
        return reviews[0]
    else:
        return None

df['Review'] = df['Reviews'].apply(get_review)
df
    Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   [Just like home, A Warm Welcome to Wintry Amst...
1   [[Great food and staff, just perfect], [01/06/...   [Great food and staff, just perfect]
2   [[Satisfaction, Delicious old school restauran...   [Satisfaction, Delicious old school restaurant]

如果您不希望将列Review列为列表,只需将其隐式转换为带分隔符的字符串即可:

def get_review(reviews):
    if type(reviews) == list:
        return ', '.join(reviews[0])
    else:
        return ''

df['Review'] = df['Reviews'].apply(get_review)
df
    Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   Just like home, A Warm Welcome to Wintry Amste...
1   [[Great food and staff, just perfect], [01/06/...   Great food and staff, just perfect
2   [[Satisfaction, Delicious old school restauran...   Satisfaction, Delicious old school restaurant

如果您输入的数据不是列表类型(即您从CSV读取),则需要先将其转换为列表:

import ast

def get_review(reviews):
    if pd.notna(reviews) and reviews != '': 
        r_list = ast.literal_eval(reviews)[0]
        if len(r_list) > 0:
            return ', '.join(r_list)
        else:
            return ''
    else:
        return ''

df2['Review'] = df2['Reviews'].apply(get_review)
df2

Reviews Review
Reviews Review
0   [['Just like home', 'A Warm Welcome to Wintry ...   Just like home, A Warm Welcome to Wintry Amste...
1   [['Great food and staff', 'just perfect'], ['0...   Great food and staff, just perfect
2   [['Satisfaction', 'Delicious old school restau...   Satisfaction, Delicious old school restaurant