Question

我有一个包含公司创新数据的数据集，通过使用一些正则表达式，我想检索许可证数据

company licences/patents
1       UX226, licence-pp-zz, licence-zz-pp, licence-xx-tt
2       VV3346E, SS345
3       licence-dd-zz
4       UT223, licence, ss
5       XBTYU, licence-tt-kk, licence-ss-tt
6       xc, zz
7       licence-xb-xz

所需的输出：

company licences/patents                                    licence
1       UX226, licence-pp-zz, licence-zz-pp, licence-xx-tt  licence-pp-zz, licence-zz-pp, licence-xx-tt 
2       VV3346E, SS345
3       licence-dd-zz                                       licence-dd-zz
4       UT223, licence, ss
5       XBTYU, licence-tt-kk, licence-ss-tt                 licence-tt-kk, licence-ss-tt
6       xc, zz
7       licence-xb-xz                                       licence-xb-xz

Answer 1

您可以尝试：

df['licence'] = df['licences/patents'].str.extractall('(licence-\w{2}-\w{2})')\
  .unstack().apply(lambda x: ', '.join(x.dropna()), axis=1)

输出：

   company                                   licences/patents                                      licence
0        1  UX226, licence-pp-zz, licence-zz-pp, licence-x...  licence-pp-zz, licence-zz-pp, licence-xx-tt
1        2                                     VV3346E, SS345                                          NaN
2        3                                      licence-dd-zz                                licence-dd-zz
3        4                                 UT223, licence, ss                                          NaN
4        5                XBTYU, licence-tt-kk, licence-ss-tt                 licence-tt-kk, licence-ss-tt
5        6                                             xc, zz                                          NaN
6        7                                      licence-xb-xz                                licence-xb-xz

Answer 2

另一种方法，使用Series.str.findall和Series.str.join：

$this->db->where('amount', 'discount', false);

[出]

using ExcelDataReader;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;

namespace Driven
{
    public class ExcelLib
    {
        public static DataTable ExcelToDataTable(string fileName)
        {
            //open file and returns as Stream
            FileStream stream = File.Open(fileName, FileMode.Open, FileAccess.Read);
            {
                using (var reader = ExcelReaderFactory.CreateReader(stream))
                {
                    var result = reader.AsDataSet(new ExcelDataSetConfiguration()
                    {
                        ConfigureDataTable = (data) => new ExcelDataTableConfiguration()
                        {
                            UseHeaderRow = true
                        }
                    });

                    //Get all the Tables
                    DataTableCollection table = result.Tables;
                    //Store it in DataTable
                    DataTable resultTable = table["Sheet1"];

                    //return
                    return resultTable;
                }
            }
        }

            static List<Datacollection> dataCol = new List<Datacollection>();

        public static void PopulateInCollection(string fileName)
        {
            DataTable table = ExcelToDataTable(fileName);

            //Iterate through the rows and columns of the Table
            for (int row = 1; row <= table.Rows.Count - 1; row++)
            {
                for (int col = 0; col <= table.Columns.Count; col++)
                {
                    Datacollection dtTable = new Datacollection()
                    {
                        rowNumber = row,
                        colName = table.Columns[col].ColumnName,
                        colValue = table.Rows[row - 1][col].ToString()
                    };
                    //Add all the details for each row
                    dataCol.Add(dtTable);
                }
            }
        }

        public static string ReadData(int rowNumber, string columnName)
        {
            try
            {
                //Retriving Data using LINQ to reduce much of iterations
                string data = (from colData in dataCol
                               where colData.colName == columnName && colData.rowNumber == rowNumber
                               select colData.colValue).SingleOrDefault();

                //var datas = dataCol.Where(x => x.colName == columnName && x.rowNumber == rowNumber).SingleOrDefault().colValue;
                return data.ToString();
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
                return null;
            }
        }
        public class Datacollection
        {
            public int rowNumber { get; set; }
            public string colName { get; set; }
            public string colValue { get; set; }
        }
    }
}

Answer 3

This RegEx可能会帮助您创建一个组$1，其中所需的输出可能是：

(licence-[a-z]{2}-[a-z]{2})

Answer 4

function formikCustomValidation({ field, form }: FieldProps, ownerCmp: Component<any, any>) { return { ...field, onBlur: e => { if (field.value !== form.initialValues[field.name]) { ownerCmp.setState({ changedFields: { ...ownerCmp.state.changedFields, [field.name]: true } }); } field.onBlur(e); setTimeout(form.validateForm); }, onChange: e => { field.onChange(e); if (ownerCmp.state.changedFields[field.name]) { setTimeout(form.validateForm); } } }; }

它将创建新的列许可证，并从原始列中剥离许可证。

Answer 5

尝试以下代码：

df['licences/patents'].str.findall('(licence-[\w\-]+)').apply(", ".join)

使用正则表达式识别熊猫列中的模式并清除数据

5 个答案: