给定another question的数据集:
user item \
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 1
3 1
4 1
是否有任何方式以预期的格式加载此类数据而无需手动将所有内容移动到同一行中?
答案 0 :(得分:1)
其中一种方法是基于\n\n
进行拆分,然后创建单独的数据帧,然后将它们连接起来。即
#Bit of code from https://stackoverflow.com/questions/45740537/copying-multiindex-dataframes-with-pd-read-clipboard
def read_clipboard_split(index_names_row=None, **kwargs):
encoding = kwargs.pop('encoding', 'utf-8')
# only utf-8 is valid for passed value because that's what clipboard
# supports
if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
raise NotImplementedError(
'reading from clipboard only supports utf-8 encoding')
from pandas import compat, read_fwf
from pandas.io.clipboard import clipboard_get
from pandas.io.common import StringIO
data = clipboard_get()
items = data.split("\n\n")
k = []
for i in items:
k.append(read_fwf(StringIO(i), **kwargs))
df = pd.concat(k,axis=1)
return df
read_clipboard_split()
示例运行:
user \ 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e 4 b80344d063b5ccb3212f76538f3d9e43d87dca9e rating 0 1 1 2 2 1 3 1 4 1
输出:
Unnamed: 0 user \ Unnamed: 0 rating 0 0 b80344d063b5ccb3212f76538f3d9e43d87dca9e 0 1 1 1 b80344d063b5ccb3212f76538f3d9e43d87dca9e 1 2 2 2 b80344d063b5ccb3212f76538f3d9e43d87dca9e 2 1 3 3 b80344d063b5ccb3212f76538f3d9e43d87dca9e 3 1 4 4 b80344d063b5ccb3212f76538f3d9e43d87dca9e 4 1