从句子列表中删除单词

时间:2019-11-06 17:03:32

标签: python pandas

我有一个频道名称列表,我想从这些名称中删除单词。 我在(Removing words from list in python)讨论中尝试了方法,但对我不起作用。 我有这些:

'Housekeeping.XTX_heater-0_Switch_Status'
 'Housekeeping.PDM_1__SW11_Status'
 'Housekeeping.Slim6_Imager-1_Switch_Status'
 'Power.BCM1_Battery_Cell_Temperature_degC'
 'Power.BCM2_Battery_Cell_Temperature_degC'
 'Power.BCR1__Battery_Discharge_Current_A'
 'Power.BCR0__Array_Temperature_degC'
 'Power.BCM0_Battery_Interface_Plate_Temp_degC'
 'Power.PDM_2__PDM_Current_A' 'Power.PDM_1__PDM_Temperature_degC'
 'Power.PDM_1__PDM_Current_A' 'Power.PDM_0__PDM_Temperature_degC'
 'Power.PDM_0__PDM_Current_A' 'Power.BCR2__BCR_Temperature_degC'
 'Power.BCR2__Battery_Discharge_Current_A'
 'Power.BCR2__Battery_Charge_Current_mA' 'Power.BCR2__Array_Voltage_V'
 'Power.BCR2__Array_Temperature_degC' 'Power.BCR2__Array_Current_mA'
 'Power.BCR1__BCR_Temperature_degC'
 'Power.BCR1__Battery_Charge_Current_mA' 'Power.BCR1__Array_Voltage_V'
 'Power.BCR1__Array_Temperature_degC' 'Power.BCR1__Array_Current_mA'
 'Power.BCR0__Overvoltage_Clamp_Current_A'
 'Power.BCR0__BCR_Temperature_degC' 'Power.BCR0__Battery_Voltage_V'
 'Power.BCR0__Battery_Charge_Current_mA' 'Power.BCR0__Array_Voltage_V'
 'Power.BCR0__Array_Current_mA' 'Thermal.WHL1_Measured_Current_mA'
 'Thermal.WHL0_Measured_Current_mA' 'Thermal.WHL1_IF_Temp_degC'
 'Thermal.WHL2_IF_Temp_degC'
 'Thermal.Prop_controller_-Y_panel__temperature_degC'
 'Thermal.WHL3_IF_Temp_degC' 'Thermal.WHL0_IF_Temp_degC'
 'Thermal.WHL3_Measured_Current_mA' 'Thermal.WHL2_Measured_Current_mA'
 'Thermal.SS1_Temperature_degC'
 'Thermal.Imager_flat_plate_EFF__temperature_degC'
 'Thermal.OBC_Temp_PPC750FL_degC' 'Thermal.OBC_Temp_PCB_degC'
 'Thermal.MTM-0_Temperature_degC' 'Thermal.AIM_Module_Temperature_degC'
 'Thermal.Sep_system_panel_-Z_+X__temperature_degC'
 'Thermal.OBDH_cardframe_-X_panel__temperature_degC'
 'Thermal.SS0_Temperature_degC' 'LIN.LIN_Failed_Nodes_Count'
 'LIN.LIN_BCM_Fail' 'LIN.LIN_Bus_Fail' 'LIN.LIN_Passive'
 'LIN.LIN_Master_1_State_Of_Health' 'LIN.LIN_Master_Up_Time'
 'LIN.LR_PA_Temperature_degC' 'LIN.My_IP_Packets' 'LIN.Switch_Error'
 'LIN.PA_Current_mA' 'LIN.S-Band_Power_Amplifier_ONOFF_State'
 'LIN.STRx0_Uplink_Reset_Count' 'LIN.STRx1_Uplink_Reset_Count'
 'LIN.Switch_Transaction_Fail_Count' 'LIN.Switch_Transaction_OK_Count'
 'LIN.TTC_0_Current_mA' 'LIN.TTC_1_Current_mA' 'LIN.TTC_Reset_Cause'
 'LIN.RSSI_dBm' 'LIN.TTC0_Temperature_degC' 'LIN.LIN_SPARE_STATUS'
 'LIN.LIN_Master_Reset' 'LIN.COUNT_FPGA_RX_STRx0' 'LIN.Lifetime_Cold_Boot'
 'LIN.Lifetime_Warm_Boot' 'LIN.LIN_Comms_Error_Count'
 'LIN.LIN_Node_Resets_Count' 'LIN.LIN_Bus_Reset'
 'LIN.LIN_Failed_Switches_Count' 'LIN.LIN_Master_0_State_Of_Health'
 'LIN.TTC1_Temperature_degC' 'LIN.UDP_Error_STRx0'
 'LIN.UDP_IPS_size_errors_STRx0' 'LIN.UDP_IPS_STRx0' 'LIN.UDP_Total_STRx0'
 'LIN.UDP_Valid_STRx0' 'LIN.UPD_IPS_errors_STRx0' 'LIN.Warm_Resets'
 'LIN.Cold_Resets' 'LIN.CAN_Reset_Count']

并希望删除句子的这些部分:

['Housekeeping.(including period)', 'Power.', 'Thermal.', 'LIN.']

预期输出为:

'XTX_heater-0_Switch_Status'
 'PDM_1__SW11_Status'
 'Slim6_Imager-1_Switch_Status'
 'BCM1_Battery_Cell_Temperature_degC'
 'BCM2_Battery_Cell_Temperature_degC'
 'BCR1__Battery_Discharge_Current_A'

以此类推。

3 个答案:

答案 0 :(得分:0)

让我们这样说:

import re
abc=['Housekeeping.XTX_heater-0_Switch_Status',
 'Housekeeping.PDM_1__SW11_Status',
 'Housekeeping.Slim6_Imager-1_Switch_Status',
 'Power.BCM1_Battery_Cell_Temperature_degC']
stop=['Housekeeping.', 'Power.', 'Thermal.', 'LIN.\s+']
print([(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc])

这是从您提供的链接中进行的,我对其进行了测试并且可以正常工作。试试看

答案 1 :(得分:0)

也可以不用正则表达式来解决:

new_list= [ w.partition('.')[2] for w in old_list ]

答案 2 :(得分:0)

可能会发生以下情况:

import copy
def remove_bad_words(in_stryngs, bad_words):
    bad_words = iter(bad_words)
    try:
        bad_word = next(bad_words)
    except StopIteration:
        return in_stryngs
    in_stryngs = iter(in_stryngs)
    out_strings = list()
    for stryng in in_stryngs:
        split_string = stryng.split(bad_word)
        blah = remove_bad_words(split_string, copy.copy(bad_words))
        out_strings.append("".join(blah))
    return out_strings

它在使用中:

bad_words = ["hello", "world"]

channel_names = [
    "Nationahellol Broadcahellosting Company (NBC)",
    "worldCworldBworldS (formerly world known asworld the Columbia world Broadcasting System)",
    "the Americaworldworldn Broadcashelloting Company (ABC)",
    "the Fox Broadchelloasting Coworldworldmpany (Fox)",
    "the ChelloW Televiworldsion Network.",
    "public broadcworldasting serhellovice (PBS)"
]

clean_chanel_names = remove_bad_words(channel_names, bad_words)

print("\n".join(clean_chanel_names))

输出为:

National Broadcasting Company (NBC)
CBS (formerly  known as the Columbia  Broadcasting System)
the American Broadcasting Company (ABC)
the Fox Broadcasting Company (Fox)
the CW Television Network.
public broadcasting service (PBS)