我有一个包含两列的数据集,我需要从这种格式更改它:
10 1
10 5
10 3
11 5
11 4
12 6
12 2
到这个
10 1 5 3
11 5 4
12 6 2
我需要第一列中的每个唯一值都在它自己的行上。
我是Python的初学者,除了阅读我的文本文件外,我还不知道如何继续。
答案 0 :(得分:3)
您可以使用Pandas数据帧。
import pandas as pd
df = pd.DataFrame({'A':[10,10,10,11,11,12,12],'B':[1,5,3,5,4,6,2]})
print(df)
输出:
A B
0 10 1
1 10 5
2 10 3
3 11 5
4 11 4
5 12 6
6 12 2
让我们使用groupby
和join
:
df.groupby('A')['B'].apply(lambda x:' '.join(x.astype(str)))
输出:
A
10 1 5 3
11 5 4
12 6 2
Name: B, dtype: object
答案 1 :(得分:1)
仅使用itertools.groupby
的示例;这都在python标准库中(尽管pandas
version更简洁!)。
假设您想要分组的密钥相邻,这可能都是懒惰地完成(不需要随时在内存中存储所有数据):
from io import StringIO
from itertools import groupby
text = '''10 1
10 5
10 3
11 5
11 4
12 6
12 2'''
# read and group data:
with StringIO(text) as file:
keys = []
res = {}
data = (line.strip().split() for line in file)
for k, g in groupby(data, key=lambda x: x[0]):
keys.append(k)
res[k] = [item[1] for item in g]
print(keys) # ['10', '11', '12']
print(res) # {'12': ['6', '2'], '10': ['1', '5', '3'], '11': ['5', '4']}
# write grouped data:
with StringIO() as out_file:
for key in keys:
out_file.write('{:3s}'.format(key))
out_file.write(' '.join(['{:3s}'.format(item) for item in res[key]]))
out_file.write('\n')
print(out_file.getvalue())
# 10 1 5 3
# 11 5 4
# 12 6 2
然后,您可以将with StringIO(text) as file:
替换为类似with open('infile.txt', 'r') as file
的程序,以便程序读取您的实际文件(类似于输出文件open('outfile.txt', 'w')
)。
:当然每次找到密钥时都可以直接写入输出文件;这样,您就不需要随时将所有数据都存储在内存中:
with StringIO(text) as file, StringIO() as out_file:
data = (line.strip().split() for line in file)
for k, g in groupby(data, key=lambda x: x[0]):
out_file.write('{:3s}'.format(k))
out_file.write(' '.join(['{:3s}'.format(item[1]) for item in g]))
out_file.write('\n')
print(out_file.getvalue())
答案 2 :(得分:1)
import collections
with open('yourfile.txt', 'r') as f:
d = collections.defaultdict(list)
for k,v in (l.split() for l in f.read().splitlines()): # processing each line
d[k].append(v) # accumulating values for the same 1st column
for k,v in sorted(d.items()): # outputting grouped sequences
print('%s %s' % (k,' '.join(v)))
输出:
10 1 5 3
11 5 4
12 6 2
答案 3 :(得分:0)
使用 $( "table > td" ).find( ".item-id" ).val();
or
<?php
while($row = mysqli_fetch_array($result))
{
echo "<tr id='row'>";
echo "<td class='item-id'>" . $row['id'] . "</td>";
echo "<td>" . $row['nome']. "</td>";
echo "<td>" . $row['email']. "</td>";
echo "</tr>";
}
?>
<script>
var id = [];
var values = [];
$("#row > .item-id").each(function(index){
id.push($(this).attr("id")); // [a,b,c,....]
values.push($(this).text()); //[ Dummy1, Dummy2, Dummy3, Dummy4,..]
});
</script>
可能会更容易。您可以使用pandas
函数读取read_csv
文件,其中数据以空格或空格分隔。
txt
这会将import pandas as pd
df = pd.read_csv("input.txt", header=None, delimiter="\s+")
# setting column names
df.columns = ['col1', 'col2']
df
的输出显示为:
dataframe
将 col1 col2
0 10 1
1 10 5
2 10 3
3 11 5
4 11 4
5 12 6
6 12 2
文件读取到txt
后,类似于之前其他answer中的dataframe
,您还可以使用apply
和aggregate
:
join
输出:
df_combine = df.groupby('col1')['col2'].agg(lambda col: ' '.join(col.astype('str'))).reset_index()
df_combine
答案 4 :(得分:0)
我使用dictonaries找到了这个解决方案:
with open("data.txt", encoding='utf-8') as data:
file = data.readlines()
dic = {}
for line in file:
list1 = line.split()
try:
dic[list1[0]] += list1[1] + ' '
except KeyError:
dic[list1[0]] = list1[1] + ' '
for k,v in dic.items():
print(k,v)
<强> 输出 强>
10 1 5 3
11 5 4
12 6 2
更实用的功能
def getdata(datafile):
with open(datafile, encoding='utf-8') as data:
file = data.readlines()
dic = {}
for line in file:
list1 = line.split()
try:
dic[list1[0]] += list1[1] + ' '
except KeyError:
dic[list1[0]] = list1[1] + ' '
for k,v in dic.items():
v = v.split()
print(k, ':',v)
getdata("data.txt")
<强> 输出 强>
11:['5','4']
12:['6','2']
10:['1','5','3']