使用自定义规则pandas将字符串列表转换为列

时间:2018-03-15 09:48:06

标签: python pandas

我的数据框列中包含字符串列表:

public void RingtonesList() {
  RingtoneManager manager = new RingtoneManager(this);
  manager.setType(RingtoneManager.TYPE_RINGTONE);
  Cursor cursor = manager.getCursor();
  while (cursor.moveToNext()) {
    String title = cursor.getString(RingtoneManager.TITLE_COLUMN_INDEX);
    String uri = cursor.getString(RingtoneManager.URI_COLUMN_INDEX);
    // Do something with the title and the URI of ringtone
Log.d("URI",""+uri);
  }
}

Also give permission

<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_INTERNAL_STORAGE" />

df

我想从每行的包含整数的字符串中提取数字信息 例如,我需要创建一个名为data = [{'column A': '3 item X; 4 item Y; item E of size 7', 'column B': 'item I of size 10; item X has 5 specificities; characteristic W'}, {'column A': '13 item X; item F of size 0; 9 item Y', 'column B': 'item J of size 11; item Y has 8 specificities'}] df = pd.DataFrame(data) 的新列,该列为A列中Size item E的第一行取值7,因为该列表包含df。<登记/> 如果字符串列表中的值不包含数字,我只想将它们编码为1或0(如果它存在于原始列表中)。

以下是我想要的输出的摘要:

df2

这是我到目前为止编写的内容,仅应用了1条规则:

item E of size 7

这回复了以下数据框:

df3

如您所见,我不能按行应用我的特征提取,它会更新整个熊猫系列。是否要逐步更新每一行的新列值?

1 个答案:

答案 0 :(得分:0)

不要去复杂的功能pandas有很棒的字符串操作功能。 检查此代码以获得所需的输出。

data = [{'column A': '3 item X; 4 item Y; item E of size 7', 'column B': 'item I of size 10; item X has 5 specificities; characteristic W'},
        {'column A': '13 item X; item F of size 0; 9 item Y', 'column B': 'item J of size 11; item Y has 8 specificities'}]

df = pd.DataFrame(data)

#joining 2 columns with ';'
df['All Columns joined'] = df[['column A','column B']].apply(lambda x: ';'.join(x), axis=1)

#creating empty dataframe
df_new = pd.DataFrame([])

#Desired output logic using string extract function
df_new['Nb item X'] = df['All Columns joined'].str.extract(r'([0-9]+) item X',expand = False)
df_new['Nb item Y'] = df['All Columns joined'].str.extract(r'([0-9]+) item Y',expand = False)
df_new['Nb specificities item X'] = df['All Columns joined'].str.extract(r'item X has ([0-9]+) specificities',expand = False)
df_new['Nb specificities item Y'] = df['All Columns joined'].str.extract(r'item Y has ([0-9]+) specificities',expand = False)
df_new['Size item E'] = df['All Columns joined'].str.extract(r'item E of size ([0-9]+)',expand = False)
df_new['Size item F'] = df['All Columns joined'].str.extract(r'item F of size ([0-9]+)',expand = False)
df_new['Size item I'] = df['All Columns joined'].str.extract(r'item I of size ([0-9]+)',expand = False)
df_new['Size item J'] = df['All Columns joined'].str.extract(r'item J of size ([0-9]+)',expand = False)
df_new['characteristic W'] = df['All Columns joined'].str.extract(r'(characteristic W)',expand = False).notnull().astype(int)

df_new

    Nb item X   Nb item Y   Nb specificities item X Nb specificities item Y Size item E Size item F Size item I Size item J characteristic W
0           3           4                         5                     NaN           7         NaN          10         NaN                1
1          13           9                       NaN                       8         NaN           0         NaN          11                0

df_new数据帧的输出。 enter image description here