Question

基本上，我有一个上传到python的excel文件，我创建了一个新列，该列确定每行中是否有单词，如果一行中有单词，则该单词为true，如果不是false。所以我有这个新专栏，即时通讯试图找到正确和错误的百分比。稍后，我将尝试制作一张表格，将所有正确与错误分开。首先，我需要帮助。我是上周刚开始的初学者

因此对于百分比问题，我决定首先创建一个代码以计算列中“ true”和“ false”一词的出现，然后我会做一些数学运算来获取百分比，但我没有过去计数发生。以下代码的乘积为0，这不是应该显示的内容。

import pandas as pd
import xlrd
df = pd.read_excel (r'C:\New folder\CrohnsD.xlsx')
print (df)
df['has_word_icd'] = df.apply(lambda row: True if 
row.str.contains('ICD').any() else False, axis=1)
print(df['has_word_icd'])
#df.to_excel(r'C:\New folder\done.xlsx')
test_str = "df['has_word_icd']"
counter = test_str.count('true')
print (str(counter))

这是更新版本，仍然给我0，我不能更改df ['has_word_icd']，因为那是最初引入变量的方式

import pandas as pd
import xlrd
df = pd.read_excel (r'C:\New folder\CrohnsD.xlsx')
print (df)
df['has_word_icd'] = df.apply(lambda row: True if 
row.str.contains('ICD').any() else False, axis=1)
print(df['has_word_icd'])
#df.to_excel(r'C:\New folder\done.xlsx')

test_str = (df['has_word_icd'])

count = 0
for i in range(len(test_str)):
   if test_str[i] == 'true':
        count += 1
  i += 1

print(count)

两者都给了我相同的结果

请帮助我，两个代码的输出均为“ 0”，不应该是这样。有人帮我得到一个代码，该代码直接给了我“ true”和“ false”的百分比

Answer 1

这是一种使用列表理解的方法。对于百分比，可以使用np.mean()函数：

import numpy as np

df= pd.DataFrame({'a' : ['hello icd', 'bob', 'bob icd', 'hello'],
                  'b' : ['bye', 'you', 'bob is icd better', 'bob is young']})

df['contains_word_icd'] = df.apply(lambda row :
                                   any([True if 'icd' in row[x] else False for x in df.columns]), axis=1)
percentage = np.mean(df['contains_word_icd'])
# 0.5

输出：

           a                  b  contains_word_icd
0  hello icd                bye               True
1        bob                you              False
2    bob icd  bob is icd better               True
3      hello       bob is young              False

Answer 2

主要问题在这里："df['has_word_icd']"。您将一个变量放在引号中，这对python来说意味着一个纯字符串。正确的是 test_str = df[has_word_icd]

然后您像这样循环遍历test_str：

count  = 0
for i in range(len(test_str)):
  if test_str[i] == 'true':
        count += 1
  i += 1

print(count)

然后获取百分比：

percent = (count / range(len(df[has_word_icd]]) * 100

如何查找字符串中单词的出现率和出现率；如何解决错误

2 个答案: