用熊猫计算CSV行中的分隔符

时间:2018-12-20 05:15:06

标签: python python-3.x pandas csv dataframe

我有一个csv文件,如下所示:

name,age
something
tom,20

当我将其放入数据框时,它看起来像:

df = pd.read_csv('file', header=None)

     0           1
1    name        age
2    something   NaN
3    tom         20

如何获取原始行数据中的逗号计数。例如,答案应如下所示:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

     0           1      _count_separators
1    name        age   1
2    something   NaN   0
3    tom         20    1

4 个答案:

答案 0 :(得分:10)

非常简单,将数据作为一个单列系列读取,然后以逗号分割并与分隔符计数连接。

<script>
//*********************************************
// Function that Shows an HTML element
//*********************************************
function showDiv(divID)
{
	var div = document.getElementById(divID);
	div.style.display = ""; //display div
}

//*********************************************
// Function that Hides an HTML element
//*********************************************
function hideDiv(divID)
{
	var div = document.getElementById(divID);
	div.style.display = "none"; // hide
}
//*****************************************************************************
// Function that Hides all the Div elements in the select menu Value
//*****************************************************************************
function hideAllDivs()
{
	//Loop through the seclect menu values and hide all
	var selectMenu = document.getElementById("selectMenu");
	for (var i=0; i<=selectMenu.options.length -1; i++)
	{
		hideDiv(selectMenu.options[i].value);
	}
}
//*********************************************
// Main function that calls others to toggle divs
//*********************************************
function toggle(showID)
{
	hideAllDivs(); // Hide all
	showDiv(showID); // Show the one we asked for

}
</script>

<html>

<body onload="hideAllDivs();">

 <select id="selectMenu" 
  onchange="toggle(this.options[this.options.selectedIndex].value)">
  <option value="formNumber"> Select Industry </option>
  <option value="formNumber1"> Industry1 </option>
  <option value="formNumber2"> Industry2  </option>
  <option value="formNumber3"> Industry3  </option>
  <option value="formNumber4"> Industry4  </option>

 </select>
</body>


When I try to duplicate this code it does not hide the divs?

 <div id="formNumber"></div>
 <div id="formNumber1">Visitors:200<br>leads:200</div> 
 <div id="formNumber2">Visitors:300<br>leads:300</div>
 <div id="formNumber3">Visitors:500<br>leads:500</div>
 <div id="formNumber4">Visitors:700<br>leads:700</div>

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

另一种级联解决方案是在索引上pd.concat([ s.str.split(',', expand=True), s.str.count(',').rename('_count_sep') ], axis=1) 0 1 _count_sep 0 name age 1 1 something None 0 2 tom 20 1 (这是一个整洁的衬里):

join

答案 1 :(得分:8)

这样做

docker

数据

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0    1
1    0
2    1
3    5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

答案 2 :(得分:1)

您可以将csv模块用于计数定界符。这是一个两遍解决方案,但与其他一遍解决方案相比不一定有效。

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

           0    1  _count_separators
0       name  age                  1
1  something  NaN                  0
2        tom   20                  1

答案 3 :(得分:0)

一行代码:word.upcase.each_char.map { |c| H[c] }.join(' ') #=> "Kilo Echo Victor India November"