我有一个双打列表,其中可能包含重复的值,并以升序排序,需要将其分为X个分区(用户提供X),以便:
鉴于要求在同一个分区中保留重复值,是否有一种有效的方法?
答案 0 :(得分:0)
此代码没有像组相关的任何智能:
假设,列表的长度为L。
X = 3; Chunk Size = X;
data1 = Take[data, Chunk Size]
data2 = Skip chunk size members and take next X members;
repeat;
public static IEnumerable<List<List<double>>> GetSubList()
{
List<double> values = new List<double> { 10.0, 15.0, 20.0, 20.0, 21.0 };
List<List<double>> subPartition = new List<List<double>>();
var X = 2;
int chunkSize = X;
int length = values.Count;
if (length < X)
{
subPartition.Add(values);
yield return subPartition;
yield break;
}
subPartition.Add(values.Take(chunkSize).ToList());
while (values.Skip(chunkSize).Any())
{
subPartition.Add(values.Skip(chunkSize).Take(X).ToList());
chunkSize += X;
}
yield return subPartition;
}
答案 1 :(得分:0)
假设回答自己的问题不是不好的形式,这就是我最终采用的方法:
1)计算“理想”的分区大小:[main:ZooKeeper@438] - Initiating client connection, connectString=node1.bazargani.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@6438a396\n2018-10-28 17:23:03,228 - INFO [main-SendThread(node1.bazargani.com:2181):Login@294] - successfully logged in.\n2018-10-28 17:23:03,236 - INFO [Thread-0:Login$1@127] - TGT refresh thread started.\n2018-10-28 17:23:03,302 - INFO [main-SendThread(node1.bazargani.com:2181):ZooKeeperSaslClient$1@289] - Client will use GSSAPI as SASL mechanism.\n2018-10-28 17:23:03,348 - INFO [Thread-0:Login@302] - TGT valid starting at: Sun Oct 28 17:22:57 AEDT 2018\n2018-10-28 17:23:03,361 - INFO [Thread-0:Login@303] - TGT expires: Mon Oct 29 17:22:57 AEDT 2018\n2018-10-28 17:23:03,373 - INFO [Thread-0:Login$1@181] - TGT refresh sleeping until: Mon Oct 29 13:31:38 AEDT 2018\n2018-10-28 17:23:03,757 - INFO [main-SendThread(node1.bazargani.com:2181):ClientCnxn$SendThread@1019] - Opening socket connection to server node1.bazargani.com/192.168.24.130:2181. Will attempt to SASL-authenticate using Login Context section 'Client'\n2018-10-28 17:23:04,171 - INFO [main-SendThread(node1.bazargani.com:2181):ClientCnxn$SendThread@864] - Socket connection established, initiating session, client: /192.168.24.129:56423, server: node1.bazargani.com/192.168.24.130:2181\n2018-10-28 17:23:04,255 - INFO [main-SendThread(node1.bazargani.com:2181):ClientCnxn$SendThread@1279] - Session establishment complete on server node1.bazargani.com/192.168.24.130:2181, sessionid = 0x266b909e303003d, negotiated timeout = 30000\n\nWATCHER::\n\nWatchedEvent state:SyncConnected type:None path:null\n\nWATCHER::\n\nWatchedEvent state:SaslAuthenticated type:None path:null\nNode does not exist: /nifi")
2018-10-28 17:23:06,000 - Nifi ZNode does not exist, so no pre-existing cluster.: /nifi
2018-10-28 17:23:06,002 - Generating NiFi Keystore and Truststore
2018-10-28 17:23:06,177 - File['/var/lib/ambari-agent/tmp/nifi-toolkit-1.5.0.3.1.1.0-35/bin/tls-toolkit.sh'] {'mode': 0755}
2018-10-28 17:23:36,853 - call[['chown', 'nifi:nifi', u'/usr/hdf/current/nifi/conf/keystore.jks']] {'sudo': True}
2018-10-28 17:23:36,941 - call returned (0, '')
2018-10-28 17:23:36,942 - call[['chown', 'nifi:nifi', u'/usr/hdf/current/nifi/conf/truststore.jks']] {'sudo': True}
2018-10-28 17:23:36,979 - call returned (0, '')
2018-10-28 17:23:36,984 - File['/usr/hdf/current/nifi/conf/config_version'] {'content': '{"ssl": "version1540705185883"}', 'owner': 'nifi', 'group': 'nifi', 'mode': 0600}
2018-10-28 17:23:36,985 - Writing File['/usr/hdf/current/nifi/conf/config_version'] because it doesn't exist
2018-10-28 17:23:36,986 - Changing owner for /usr/hdf/current/nifi/conf/config_version from 0 to nifi
2018-10-28 17:23:36,989 - Changing group for /usr/hdf/current/nifi/conf/config_version from 0 to nifi
2018-10-28 17:23:36,989 - Changing permission for /usr/hdf/current/nifi/conf/config_version from 644 to 600
2018-10-28 17:23:37,009 - PropertiesFile['/usr/hdf/current/nifi/conf/nifi.properties'] {'owner': 'nifi', 'group': 'nifi', 'mode': 0600, 'properties': ...}
2018-10-28 17:23:37,076 - Generating properties file: /usr/hdf/current/nifi/conf/nifi.properties
2018-10-28 17:23:37,079 - File['/usr/hdf/current/nifi/conf/nifi.properties'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0600}
2018-10-28 17:23:37,618 - Writing File['/usr/hdf/current/nifi/conf/nifi.properties'] because contents don't match
2018-10-28 17:23:37,638 - File['/usr/hdf/current/nifi/conf/bootstrap.conf'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0600}
2018-10-28 17:23:37,653 - File['/usr/hdf/current/nifi/conf/logback.xml'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0400}
2018-10-28 17:23:37,665 - File['/usr/hdf/current/nifi/conf/state-management.xml'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0400}
2018-10-28 17:23:37,776 - File['/usr/hdf/current/nifi/conf/authorizers.xml'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0600}
2018-10-28 17:23:37,780 - Writing File['/usr/hdf/current/nifi/conf/authorizers.xml'] because contents don't match
2018-10-28 17:23:37,806 - File['/usr/hdf/current/nifi/conf/login-identity-providers.xml'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0600}
2018-10-28 17:23:37,834 - File['/usr/hdf/current/nifi/bin/nifi-env.sh'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0755}
2018-10-28 17:23:37,843 - File['/usr/hdf/current/nifi/conf/bootstrap-notification-services.xml'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0400}
2018-10-28 17:23:37,852 - File['/usr/hdf/current/nifi/conf/nifi_jaas.conf'] {'owner': 'nifi', 'content': InlineTemplate(...), 'group': 'nifi', 'mode': 0400}
2018-10-28 17:23:37,872 - Encrypting NiFi sensitive configuration properties
2018-10-28 17:23:37,874 - File['/var/lib/ambari-agent/tmp/nifi-toolkit-1.5.0.3.1.1.0-35/bin/encrypt-config.sh'] {'mode': 0755}
2018-10-28 17:23:38,010 - Execute[('/var/lib/ambari-agent/tmp/nifi-toolkit-1.5.0.3.1.1.0-35/bin/encrypt-config.sh', '-v', '-b', '/usr/hdf/current/nifi/conf/bootstrap.conf', '-n', '/usr/hdf/current/nifi/conf/nifi.properties', '-l', '/usr/hdf/current/nifi/conf/login-identity-providers.xml', '-a', '/usr/hdf/current/nifi/conf/authorizers.xml', '-p', [PROTECTED])] {'environment': {'JAVA_OPTS': '-Xms128m -Xmx256m', 'JAVA_HOME': '/usr/java/jdk1.8.0_181'}, 'logoutput': False, 'user': 'nifi'}
2018-10-28 17:24:01,909 - Skipping stack-select on NIFI because it does not exist in the stack-select package structure.
Command failed after 1 tries
2)第一个分区从索引0开始
3)计算连续的潜在断点指数为:
valuesCount / numPartitions
4)断点必须落在第一次出现的值上。如果不是,则将断点调整为第一个出现的值或下一个值(以较近者为准)。
5)使用与每个分区的理想大小的平方差总和作为质量指标。
6)随着每个附加断点的添加,请尝试通过将每个先前断点前后移动一个“值变化”并重新计算质量指标来依次调整每个断点。如果指标较低,请保留更改,然后重试。
需要进行一些特殊情况检查,例如价值中断少于请求的分区。可能还有一些我没有考虑过的极端情况。但是,对于我尝试过的数据集来说,这似乎很快就能给出合理的结果。