主程序中的布尔值错误(Python)

时间:2018-10-29 18:00:25

标签: python string list csv

我正在尝试用Python编写此简单的代码:如果csv文件行的第二个元素包含“ malware_list”列表中指定的族之一,则主程序应打印“ true”。但是,结果是程序始终打印“ FALSE”。

文件中的每一行都采用以下格式: “ NAME,FAMILY”

这是代码:

malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
            "Opfake", "GingerMaster", "BaseBridge",
            "Iconosys", "Kmin", "FakeDoc", "Geinimi",
            "Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
            "MobileTx", "FakeRun", "SendPay", "Gappusin",
            "Imlog", "SMSreg"]

def is_malware (line):
    line_splitted = line.split(",")
    family = line_splitted[1]
    if family in malware_list:
        return True
    return False

def main():
    with open("datset_small.csv", "r") as f:
        for i in range(1,100):
            line = f.readline()
            print(is_malware(line))

if __name__ == "__main__": 
    main()

4 个答案:

答案 0 :(得分:4)

line = f.readline()

readline不会从结果中删除结尾的换行符,因此最可能的line看起来像"STEVE,FakeDoc\n"。然后family成为"FakeDoc\n"的成员,它不是malware_list的成员,因此您的函数返回False。

尝试阅读后去除空白:

line = f.readline().strip()

答案 1 :(得分:0)

python有一个名为pandas的软件包。通过使用熊猫,我们可以读取数据框格式的CSV文件。

import pandas as pd df=pd.read_csv("datset_small.csv")

请以CSV文件形式发布您的内容,以便我为您提供帮助

答案 2 :(得分:0)

使用数据框可以轻松实现。 示例代码如下

import pandas as pd

malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
            "Opfake", "GingerMaster", "BaseBridge",
            "Iconosys", "Kmin", "FakeDoc", "Geinimi",
            "Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
            "MobileTx", "FakeRun", "SendPay", "Gappusin",
            "Imlog", "SMSreg"]
# read csv into dataframe
df = pd.read_csv('datset_small.csv')
print(df['FAMILY'].isin(malware_list))

输出是

0    True
1    True
2    True

使用的示例csv是

NAME,FAMILY
090b5be26bcc4df6186124c2b47831eb96761fcf61282d63e13fa235a20c7539,Plankton
bedf51a5732d94c173bcd8ed918333954f5a78307c2a2f064b97b43278330f54,DroidKungFu
149bde78b32be3c4c25379dd6c3310ce08eaf58804067a9870cfe7b4f51e62fe,Plankton

答案 3 :(得分:0)

我会设置速度列表而不是列表,并且由于代码的速度和易用性,Pandas肯定更好。您可以在y逻辑中使用x来获取结果;)

import io #not needed in your case
import pandas as pd

data = io.StringIO('''090b5be26bcc4df6186124c2b47831eb96761fcf61282d63e13fa235a20c7539,Plankton 
bedf51a5732d94c173bcd8ed918333954f5a78307c2a2f064b97b43278330f54,DroidKungFu 
149bde78b32be3c4c25379dd6c3310ce08eaf58804067a9870cfe7b4f51e62fe,Plankton''')

df = pd.read_csv(data,sep=',',header=None)
malware_set = ("FakeInstaller","DroidKungFu", "Plankton",
            "Opfake", "GingerMaster", "BaseBridge",
            "Iconosys", "Kmin", "FakeDoc", "Geinimi",
            "Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
            "MobileTx", "FakeRun", "SendPay", "Gappusin",
            "Imlog", "SMSreg")



df.columns = ['id','software']

df['malware'] = df['software'].apply(lambda x: x.strip() in malware_set)

print(df)