Python Pandas - 确定列0中的值是否在每个后续列中重复

时间:2016-10-25 15:35:12

标签: python pandas dataframe tuples

我有一系列互联网连接设备在各个地方闲逛。我有一个包含七行的数据框,过去一周每天一行。每行包含当天未连接到我的服务器的每个设备的序列号。我正在尝试编译一个创建第8行的报告,其中包含连续七天无法通信的每个设备的序列号。这是我的数据框的简化模型:

2016-10-01, AAAA, BBBB, CCCC, EEEE
2016-10-02, AAAA, BBBB, EEEE,
2016-10-03, AAAA, BBBB, CCCC, EEEE
2016-10-04, AAAA, BBBB, CCCC, EEEE
2016-10-05, BBBB, CCCC, DDDD, EEEE
2016-10-06, AAAA, BBBB, CCCC, EEEE
2016-10-07, AAAA, BBBB, CCCC, FFFF

这是给我带来问题的代码块。我试图将第一列中的值与每个其他列的值进行比较。如果我得到6 Trues,我会将序列号添加到新的list,然后尝试将其添加到数据框中。

cursor = localConnection.cursor()
cursor.execute(allInstalledQuery % ('', fac[0]))
cursor.fetchall()
totalDevices = cursor.rowcount
cursor.close()
for i in range(7, 0, -1):
    loopCursor = localConnection.cursor()
    print(i)
    sns = []
    d = datetime.datetime.strptime(todayDate, "%Y-%m-%d") + datetime.timedelta(days=-i)
    sns.append(d.strftime("%Y-%m-%d"))
    if i != 1:
        loopCursor.execute(missingReportQuery % (i, 1, i, 1, '', fac[0]))
    else:
        loopCursor.execute(missingReportQuery % (i, 0, i, 0, '', fac[0]))
        rows = loopCursor.fetchall()
        numMissing = loopCursor.rowcount
        missingSummary = "%d / %d devices missing"
        sns.append(staleSummary % (nummissing, totaldevices))
        for row in rows:
            sns.append(row[4])
    masterList.append(sns)
    loopCursor.close()

df = pandas.DataFrame(masterList)

编辑以下代码块已被删除:

firstDayData = df.iloc[[0], :].values
missingSevenDays = []

for s in firstDayData:
    print(s)
    a = pandas.Series(df[1])
    b = pandas.Series(df[2])
    c = pandas.Series(df[3])
    d = pandas.Series(df[4])
    e = pandas.Series(df[5])
    sn = list(str(s))
    if a.isin(sn) is True and b.isin(sn) is True and c.isin(sn) is True and d.isin(sn) is True \
            and e.isin(sn) is True:
        missingSevenDays.append(sn)
df.append(missingSevenDays)

AND已被替换为:

counts = df.stack().value_counts()
seven_day = counts[counts == 7]
filtered_df = df[seven_day.index]
missingSevenDays = []
for neat in filtered_df.values:
    print(neat)

我希望打印出7天内丢失的所有设备的序列号。就目前而言,它只是打印出[]。我担心如何使用这些数据结构会让我感到很困惑。

0 个答案:

没有答案