python中伪透视表的矢量化实现

时间:2016-05-23 19:56:46

标签: python pandas dataframe vectorization

我有以下数据框,包括一些车辆和所述车辆的组件:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:layout_width="match_parent"
android:layout_height="match_parent" >
<ImageView
    android:id="@+id/item_icon"
    android:layout_width="40dp"
    android:layout_height="40dp"
    android:contentDescription="@null"
    android:src="@drawable/checkbox_marked_circle"
    android:layout_alignParentLeft="true"
    android:layout_alignParentStart="true"
    android:layout_margin="5dp" />

<TextView
    android:id="@+id/name"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:textSize="18sp"
    android:gravity="center_vertical"
    android:textColor="@color/colorPrimaryDark"
    android:textStyle="bold"
    android:layout_alignParentTop="true"
    android:layout_toRightOf="@+id/item_icon"
    android:layout_toEndOf="@+id/item_icon"
    android:paddingLeft="5dp"
    android:paddingTop="5dp"/>

<TextView android:id="@+id/description"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:layout_below="@+id/name"
    android:layout_toRightOf="@+id/item_icon"
    android:layout_toEndOf="@+id/item_icon"
    android:paddingLeft="5dp"/>

<Button
    style="?android:attr/buttonStyleSmall"
    android:layout_width="40dp"
    android:layout_height="40dp"
    android:id="@+id/deleteTicketButton"
    android:focusable="false"
    android:visibility="gone"
    android:layout_marginTop="5dp"
    android:background="@drawable/delete_forever"
    android:layout_alignParentRight="true"
    android:layout_alignParentEnd="true" />
</RelativeLayout>

我想用以下格式创建第二个数据帧,即伪数据帧,其中我为每个车辆组件现有组合添加1,否则为0。

public class AllTicketsFragment extends ListFragment implements OnItemClickListener,
    AdapterView.OnItemLongClickListener, FragmentUpdateInterface {

TicketListAdapter arrayAdapter;
TabFragmentAdapter tabMenager;

public AllTicketsFragment() {
    // Required empty public constructor
}

public AllTicketsFragment(TabFragmentAdapter tabManager) {
    this.tabMenager = tabManager;
}

@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
}

@Override
public View onCreateView(LayoutInflater inflater, ViewGroup container,
                         Bundle savedInstanceState) {
    // Inflate the layout for this fragment
    return inflater.inflate(R.layout.fragment_list_view, container, false);
}

@Override
public void onViewCreated(View view, Bundle savedInstanceState) {
    super.onViewCreated(view, savedInstanceState);
    List<Ticket> tickets = null;
    try {
        tickets = new GetTicketFromDBTask(getActivity()).execute("all").get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
    arrayAdapter = new TicketListAdapter(getActivity(), R.layout.list_ticket_view, tickets, tabMenager);
    setListAdapter(arrayAdapter);
    getListView().addFooterView(getLayoutInflater(savedInstanceState).inflate(R.layout.list_footer_view, null), null, false);
    getListView().setOnItemClickListener(this);
    getListView().setOnItemLongClickListener(this);
}

@Override
public void onItemClick(AdapterView<?> parent, View view, int position, long id) {
    Intent intent = new Intent(getActivity(), TicketDetailActivity.class);
    intent.putExtra("id", id);
    startActivityForResult(intent, 1000);
}

@Override
public boolean onItemLongClick(AdapterView<?> parent, View view, int position, long id) {
    Button deleteButton = (Button) view.findViewById(R.id.deleteTicketButton);
    if (deleteButton.getVisibility() == View.VISIBLE) {
        deleteButton.setVisibility(View.GONE);
    } else {
        deleteButton.setVisibility(View.VISIBLE);
    }
    arrayAdapter.notifyDataSetChanged();
    return true;
  }
}

我用下面发布的解决方案实现了这个,但效率非常低。感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

您可以使用df.crosstab创建频率表:

import pandas as pd

df = pd.DataFrame(
    {'Component': ['Air conditioner', 'Air conditioner', 'airbag', 'engine with 150 H/P', 'airbag',
                   '1-year concierge assistance', 'ABS breaks', 'ABS breaks', 'airbag', 
                   'air conditioner', 'engine with 250 H/P'], 
     'Vehicle': ['Ford', 'Ford', 'Ford', 'Ford', 'Toyota', 'Toyota', 'Toyota',
                 'Chrysler', 'Chrysler', 'Chrysler', 'Chrysler']})

result = pd.crosstab(index=[df['Vehicle']], columns=[df['Component']]).clip(upper=1)
print(result)

产量

Component  1-year concierge assistance  ABS breaks  Air conditioner  \
Vehicle                                                               
Chrysler                             0           1                0   
Ford                                 0           0                1   
Toyota                               1           1                0   

Component  air conditioner  airbag  engine with 150 H/P  engine with 250 H/P  
Vehicle                                                                       
Chrysler                 1       1                    0                    1  
Ford                     0       1                    1                    0  
Toyota                   0       1                    0                    0  

如果df包含重复的行,频率表可能包含大于1的值,则clip(upper=1)用于将这些值减少回1。

答案 1 :(得分:0)

这是我构建的代码,这是非常低效的,因为它使用嵌套循环。如果有人发布更优雅的实现,将不胜感激。

import pandas as pd
import numpy as np

data = pd.read_csv('data.csv')

data['vehicle'] = data['vehicle'].apply(str)

vhs = np.unique(data['vehicle'])
vhs = [x for x in vhs if str(x) != 'nan']


data['Component'] = data['Component'].apply(str)
components = np.unique(data['Component'])
components = [x for x in components if str(x) != 'nan']

componentes = ['vehicle'] + components

my_df = pd.DataFrame(columns=componentes)

vhs = np.array(vhs)

my_df['vehicles'] = vhs

my_df = my_df.fillna(0)

for vh in vhs:
    sub_data = data[data['vehicle']==vh]
    compies = np.unique(sub_data['Component'])
    for comp in compies:
        my_df[comp][my_df['vehicles']==vh] = 1

my_df.to_csv('my_vhs.csv',index=False)