Question

我在这个文件夹中有很多文件： enter image description here

我试图为数据实现字典。我有兴趣用2个密钥创建它（第一个是http地址，第二个是第三个字段（使用的插件），如adblock）。这些值被引用到不同的度量标准，因此我的目的是计算每个站点的数量，并在实现字典后插入每个度量标准的均值，中位数和方差。例如，对于平均值，我的意图是考虑文件中的所有第4个字段值等。我试着编写这个代码但是，首先，我不确定它是否正确。 enter image description here

我读了其他帖子，但没有人解决了我的问题，因为他们威胁或只有一个键，或者他们没有显示如何访问字典中的不同值来计算均值，中位数和方差。问题很简单，承认字典实现没问题，我必须以哪种方式访问key1的不同值：www.google.it - ＆gt; key2：adblock？任何形式的帮助都被接受，我可以用于任何其他答案。

Answer 1

您可以使用字典执行所需操作，但您应该考虑使用Pandas库。该库以表格数据结构为中心，称为＆＃34; DataFrame＆＃34;在列式和行式计算方面表现优异，例如您似乎需要的计算。

为了帮助您入门，以下是使用read_fwf()方法读取一个文本文件的Pandas代码。它还显示第四列的均值和方差：

# import the Pandas library:
import pandas as pd

# Read the file 'table.txt' into a DataFrame object. Assume
# a header-less, fixed-width file like in your example:
df = pd.read_fwf("table.txt", header=None)

# Show the content of the DataFrame object:
print(df)

# Print the fourth column (zero-indexed):
print(df[3])

# Print the mean for the fourth column:
print(df[3].mean())

# Print the variance for the fourth column:
print(df[3].var())

DataFrame对象中有different ways of selecting columns and rows。前面示例中的方括号[ ]按列编号选择数据框中的列。如果您只想从第三列中包含adblock的行计算第四列的平均值，您可以这样做：

# Print those rows from the data frame that have the value 'adblock'
# in the third column (zero-indexed):
print(df[df[2] == "adblock"])

# Print only the fourth column (zero-indexed) from that data frame:
print(df[df[2] == "adblock"][3])

# Print the mean of the fourth column from that data frame:
print(df[df[2] == "adblock"][3].mean())

修改您还可以同时计算多个列的均值或方差：

# Use a list of column numbers to calculate the mean for all of them # at the same time: l = [3, 4, 5] print(df[l].mean())

结束编辑

如果要从多个文件中读取数据并对连锁数据进行计算，可以使用concat()方法。此方法获取DataFrame对象列表并将它们连接起来（默认情况下，按行方式）。使用以下行从目录中的所有*.txt文件创建DataFrame：

df = pd.concat([pd.read_fwf(file, header=None) for file in glob.glob("*.txt")], ignore_index=True)

如果我在Python中有一个包含2个键的字典，我怎样才能实现mean.median和variance等函数？

1 个答案: