我正在尝试用Python编写此简单的代码:如果csv文件行的第二个元素包含“ malware_list”列表中指定的族之一,则主程序应打印“ true”。但是,结果是程序始终打印“ FALSE”。
文件中的每一行都采用以下格式: “ NAME,FAMILY”
这是代码:
malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
"Opfake", "GingerMaster", "BaseBridge",
"Iconosys", "Kmin", "FakeDoc", "Geinimi",
"Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
"MobileTx", "FakeRun", "SendPay", "Gappusin",
"Imlog", "SMSreg"]
def is_malware (line):
line_splitted = line.split(",")
family = line_splitted[1]
if family in malware_list:
return True
return False
def main():
with open("datset_small.csv", "r") as f:
for i in range(1,100):
line = f.readline()
print(is_malware(line))
if __name__ == "__main__":
main()
答案 0 :(得分:4)
line = f.readline()
readline
不会从结果中删除结尾的换行符,因此最可能的line
看起来像"STEVE,FakeDoc\n"
。然后family
成为"FakeDoc\n"
的成员,它不是malware_list
的成员,因此您的函数返回False。
尝试阅读后去除空白:
line = f.readline().strip()
答案 1 :(得分:0)
python有一个名为pandas的软件包。通过使用熊猫,我们可以读取数据框格式的CSV文件。
import pandas as pd
df=pd.read_csv("datset_small.csv")
请以CSV文件形式发布您的内容,以便我为您提供帮助
答案 2 :(得分:0)
使用数据框可以轻松实现。 示例代码如下
import pandas as pd
malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
"Opfake", "GingerMaster", "BaseBridge",
"Iconosys", "Kmin", "FakeDoc", "Geinimi",
"Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
"MobileTx", "FakeRun", "SendPay", "Gappusin",
"Imlog", "SMSreg"]
# read csv into dataframe
df = pd.read_csv('datset_small.csv')
print(df['FAMILY'].isin(malware_list))
输出是
0 True
1 True
2 True
使用的示例csv是
NAME,FAMILY
090b5be26bcc4df6186124c2b47831eb96761fcf61282d63e13fa235a20c7539,Plankton
bedf51a5732d94c173bcd8ed918333954f5a78307c2a2f064b97b43278330f54,DroidKungFu
149bde78b32be3c4c25379dd6c3310ce08eaf58804067a9870cfe7b4f51e62fe,Plankton
答案 3 :(得分:0)
我会设置速度列表而不是列表,并且由于代码的速度和易用性,Pandas肯定更好。您可以在y逻辑中使用x来获取结果;)
import io #not needed in your case
import pandas as pd
data = io.StringIO('''090b5be26bcc4df6186124c2b47831eb96761fcf61282d63e13fa235a20c7539,Plankton
bedf51a5732d94c173bcd8ed918333954f5a78307c2a2f064b97b43278330f54,DroidKungFu
149bde78b32be3c4c25379dd6c3310ce08eaf58804067a9870cfe7b4f51e62fe,Plankton''')
df = pd.read_csv(data,sep=',',header=None)
malware_set = ("FakeInstaller","DroidKungFu", "Plankton",
"Opfake", "GingerMaster", "BaseBridge",
"Iconosys", "Kmin", "FakeDoc", "Geinimi",
"Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
"MobileTx", "FakeRun", "SendPay", "Gappusin",
"Imlog", "SMSreg")
df.columns = ['id','software']
df['malware'] = df['software'].apply(lambda x: x.strip() in malware_set)
print(df)