我有一个如下字典:
[{'mississippi': 1, 'worth': 1, 'reading': 1}, {'commonplace': 1, 'river': 1, 'contrary': 1, 'ways': 1, 'remarkable': 1}, {'considering': 1, 'missouri': 1, 'main': 1, 'branch': 1, 'longest': 1, 'river': 1, 'world--four': 1}, {'seems': 1, 'safe': 1, 'crookedest': 1, 'river': 1, 'part': 1, 'journey': 1, 'uses': 1, 'cover': 1, 'ground': 1, 'crow': 1, 'fly': 1, 'six': 1, 'seventy-five': 1}, {'discharges': 1, 'water': 1, 'st': 1}, {'lawrence': 1, 'twenty-five': 1, 'rhine': 1, 'thirty-eight': 1, 'thames': 1}, {'river': 1, 'vast': 1, 'drainage-basin:': 1, 'draws': 1, 'water': 1, 'supply': 1, 'twenty-eight': 1, 'states': 1, 'territories': 1, 'delaware': 1, 'atlantic': 1, 'seaboard': 1, 'country': 1, 'idaho': 1, 'pacific': 1, 'slope--a': 1, 'spread': 1, 'forty-five': 1, 'degrees': 1, 'longitude': 1}, {'mississippi': 1, 'receives': 1, 'carries': 1, 'gulf': 1, 'water': 1, 'fifty-four': 1, 'subordinate': 1, 'rivers': 1, 'navigable': 1, 'steamboats': 1, 'hundreds': 1, 'flats': 1, 'keels': 1}, {'area': 1, 'drainage-basin': 1, 'combined': 1, 'areas': 1, 'england': 1, 'wales': 1, 'scotland': 1, 'ireland': 1, 'france': 1, 'spain': 1, 'portugal': 1, 'germany': 1, 'austria': 1, 'italy': 1, 'turkey': 1, 'almost': 1, 'wide': 1, 'region': 1, 'fertile': 1, 'mississippi': 1, 'valley': 1, 'proper': 1, 'exceptionally': 1}]
我想将其更改为我想要的输出,如下所示,计算两个目标词之间的相似度得分:
river 4
ground: 1
journey: 1
longitude: 1
main: 1
world--four: 1
contrary: 1
cover: 1
delaware: 1
remarkable: 1
vast: 1
forty-five: 1
crookedest: 1
territories: 1
spread: 1
country: 1
longest: 1
fly: 1
atlantic: 1
crow: 1
supply: 1
seems: 1
idaho: 1
seaboard: 1
states: 1
ways: 1
degrees: 1
part: 1
twenty-eight: 1
pacific: 1
branch: 1
water: 1
considering: 1
six: 1
safe: 1
commonplace: 1
draws: 1
drainage-basin: 1
uses: 1
seventy-five: 1
slope--a: 1
missouri: 1
mississippi 3
area: 1
steamboats: 1
germany: 1
reading: 1
france: 1
proper: 1
fifty-four: 1
turkey: 1
exceptionally: 1
areas: 1
carries: 1
combined: 1
flats: 1
receives: 1
england: 1
italy: 1
scotland: 1
wales: 1
almost: 1
navigable: 1
austria: 1
region: 1
wide: 1
spain: 1
subordinate: 1
drainage-basin: 1
hundreds: 1
keels: 1
portugal: 1
water: 1
gulf: 1
ireland: 1
rivers: 1
valley: 1
fertile: 1
worth: 1
water 3
steamboats: 1
spread: 1
country: 1
states: 1
longitude: 1
fifty-four: 1
pacific: 1
vast: 1
subordinate: 1
carries: 1
keels: 1
flats: 1
supply: 1
receives: 1
atlantic: 1
forty-five: 1
river: 1
rivers: 1
idaho: 1
mississippi: 1
seaboard: 1
navigable: 1
discharges: 1
degrees: 1
twenty-eight: 1
drainage-basin: 1
hundreds: 1
st: 1
gulf: 1
draws: 1
delaware: 1
territories: 1
slope--a: 1
drainage-basin 2
area: 1
spread: 1
country: 1
states: 1
mississippi: 1
longitude: 1
france: 1
proper: 1
vast: 1
turkey: 1
forty-five: 1
areas: 1
combined: 1
germany: 1
exceptionally: 1
valley: 1
supply: 1
fertile: 1
atlantic: 1
italy: 1
river: 1
idaho: 1
wales: 1
almost: 1
seaboard: 1
spain: 1
austria: 1
region: 1
degrees: 1
twenty-eight: 1
wide: 1
england: 1
portugal: 1
water: 1
ireland: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
scotland: 1
slope--a: 1
area 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
england: 1
turkey: 1
exceptionally: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
journey 1
ground: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
seems 1
ground: 1
journey: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
states 1
spread: 1
country: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
slope--a 1
spread: 1
country: 1
states: 1
degrees: 1
longitude: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
twenty-eight: 1
river: 1
idaho: 1
remarkable 1
contrary: 1
river: 1
commonplace: 1
ways: 1
vast 1
spread: 1
country: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
pacific: 1
forty-five: 1
water: 1
seaboard: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
forty-five 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
pacific: 1
water: 1
seaboard: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
twenty-eight: 1
river: 1
idaho: 1
crookedest 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
carries 1
mississippi: 1
steamboats: 1
navigable: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
germany 1
area: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
longest 1
main: 1
river: 1
world--four: 1
branch: 1
missouri: 1
considering: 1
flats 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
rivers: 1
receives: 1
supply 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
twenty-eight: 1
river: 1
idaho: 1
receives 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
crow 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
scotland 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
spain: 1
italy: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
country 1
spread: 1
idaho: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
thames 1
thirty-eight: 1
rhine: 1
lawrence: 1
twenty-five: 1
england 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
region: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
navigable 1
mississippi: 1
steamboats: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
austria 1
area: 1
germany: 1
mississippi: 1
france: 1
proper: 1
region: 1
turkey: 1
england: 1
areas: 1
combined: 1
exceptionally: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
rhine 1
thirty-eight: 1
thames: 1
lawrence: 1
twenty-five: 1
part 1
ground: 1
journey: 1
seems: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
twenty-eight 1
spread: 1
country: 1
states: 1
degrees: 1
longitude: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
branch 1
main: 1
longest: 1
river: 1
world--four: 1
missouri: 1
considering: 1
hundreds 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
st 1
water: 1
discharges: 1
considering 1
main: 1
longest: 1
river: 1
world--four: 1
branch: 1
missouri: 1
six 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
fly: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
gulf 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
flats: 1
rivers: 1
receives: 1
ireland 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
valley: 1
safe 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
commonplace 1
contrary: 1
river: 1
remarkable: 1
ways: 1
draws 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
supply: 1
delaware: 1
territories: 1
atlantic: 1
twenty-eight: 1
river: 1
idaho: 1
delaware 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
territories: 1
atlantic: 1
supply: 1
twenty-eight: 1
river: 1
idaho: 1
thirty-eight 1
thames: 1
rhine: 1
lawrence: 1
twenty-five: 1
longitude 1
spread: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
twenty-eight: 1
river: 1
idaho: 1
world--four 1
main: 1
longest: 1
river: 1
branch: 1
missouri: 1
considering: 1
lawrence 1
thirty-eight: 1
thames: 1
rhine: 1
twenty-five: 1
ground 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
steamboats 1
mississippi: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
spread 1
seaboard: 1
country: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
idaho 1
spread: 1
country: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
reading 1
mississippi: 1
worth: 1
almost 1
area: 1
germany: 1
austria: 1
france: 1
proper: 1
england: 1
turkey: 1
exceptionally: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
mississippi: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
contrary 1
river: 1
remarkable: 1
commonplace: 1
ways: 1
cover 1
ground: 1
journey: 1
seems: 1
part: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
fly: 1
france 1
area: 1
germany: 1
austria: 1
mississippi: 1
proper: 1
england: 1
turkey: 1
exceptionally: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
spain 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
pacific 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
twenty-eight: 1
river: 1
idaho: 1
turkey 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
fifty-four 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
hundreds: 1
keels: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
subordinate 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
water: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
territories 1
spread: 1
idaho: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
supply: 1
atlantic: 1
slope--a: 1
river: 1
country: 1
combined 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
exceptionally 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
england: 1
turkey: 1
region: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
region 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
twenty-five 1
thirty-eight: 1
thames: 1
lawrence: 1
rhine: 1
rivers 1
mississippi: 1
steamboats: 1
navigable: 1
carries: 1
fifty-four: 1
keels: 1
hundreds: 1
subordinate: 1
water: 1
gulf: 1
flats: 1
receives: 1
fly 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
seventy-five: 1
river: 1
atlantic 1
spread: 1
longitude: 1
country: 1
states: 1
degrees: 1
slope--a: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
river: 1
supply: 1
twenty-eight: 1
idaho: 1
italy 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
main 1
world--four: 1
longest: 1
river: 1
branch: 1
missouri: 1
considering: 1
areas 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
england: 1
turkey: 1
exceptionally: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
seaboard 1
spread: 1
country: 1
states: 1
degrees: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
fertile 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
ways 1
contrary: 1
river: 1
remarkable: 1
commonplace: 1
discharges 1
water: 1
st: 1
degrees 1
spread: 1
country: 1
states: 1
longitude: 1
twenty-eight: 1
drainage-basin: 1
vast: 1
forty-five: 1
water: 1
seaboard: 1
pacific: 1
draws: 1
delaware: 1
territories: 1
atlantic: 1
supply: 1
slope--a: 1
river: 1
idaho: 1
wide 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
proper 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
england: 1
turkey: 1
exceptionally: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
keels 1
mississippi: 1
steamboats: 1
navigable: 1
water: 1
fifty-four: 1
hundreds: 1
subordinate: 1
carries: 1
gulf: 1
flats: 1
rivers: 1
receives: 1
portugal 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
ireland: 1
valley: 1
worth 1
mississippi: 1
reading: 1
uses 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
fly: 1
seventy-five: 1
river: 1
seventy-five 1
ground: 1
journey: 1
seems: 1
part: 1
cover: 1
crow: 1
crookedest: 1
six: 1
safe: 1
uses: 1
river: 1
fly: 1
valley 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
wales: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
missouri 1
main: 1
longest: 1
river: 1
branch: 1
world--four: 1
considering: 1
wales 1
area: 1
germany: 1
austria: 1
mississippi: 1
france: 1
proper: 1
exceptionally: 1
turkey: 1
england: 1
areas: 1
combined: 1
scotland: 1
italy: 1
spain: 1
almost: 1
fertile: 1
region: 1
wide: 1
drainage-basin: 1
portugal: 1
ireland: 1
valley: 1
第一行是整个词典中的目标词及其频率。下面是相关单词及其在同一句子中与目标单词的频率。与第一本字典一样,与" mississippi"相关联的个人资料将包含对"值得"的引用。和"阅读"并且他们在句子中的单词频率是1,但密西西比的单词频率在整个字典中是3。我想按降序对目标字的单词频率进行排序。有人可以帮忙吗?
答案 0 :(得分:1)
既不是你想要的输出也不是你的代码也不是很清楚你究竟想要实现什么,但是如果只计算单个句子中的单词那么策略应该是:
common.txt
读入set
进行快速查找。sample.txt
并在.
上拆分以获得单独的句子。\b
来捕获字边界)并用空格替换它们。set
中不存在的字词。所以:
import collections
with open("common.txt", "r") as f: # open the `common.txt` for reading
common_words = {l.strip().lower() for l in f} # read each line and and add it to a set
interpunction = ";,'\"" # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))
sentences_counter = [] # a list to hold a word count for each sentence
with open("sample.txt", "r") as f: # open the `sample.txt` for reading
# read the whole file to include linebreaks and split on `.` to get individual sentences
sentences = [s for s in f.read().split(".") if s.strip()] # ignore empty sentences
for sentence in sentences: # iterate over each sentence
sentence = sentence.translate(trans_table) # replace the interpunction with spaces
word_counter = collections.defaultdict(int) # a string:int default dict for counting
for word in sentence.split(): # split the sentence and iterate over the words
if word.lower() not in common_words: # count only words not in the common.txt
word_counter[word.lower()] += 1
sentences_counter.append(word_counter) # add the current sentence word count
注意:在Python 2.x上使用string.maketrans()
代替str.maketrans()
。
这将生成sentences_counter
,其中包含sample.txt
中每个句子的字典计数,其中键是实际单词,其关联值是单词计数。您可以将结果打印为:
for i, v in enumerate(sentences_counter):
print("Sentence #{}:".format(i+1))
print("\n".join("\t{}: {}".format(w, c) for w, c in v.items()))
将产生(对于您的样本数据):
Sentence #1: area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 Sentence #2: mississippi: 1 valley: 1 proper: 1 exceptionally: 1
请记住,(英语)语言比这更复杂 - 例如,“当它愤怒时,猫会摇摆其尾巴,所以请远离它。“依赖于你对撇号的处理方式会有很大不同。此外,点不一定表示句子的结尾。如果你想进行严肃的语言分析,你应该研究NLP。
更新:虽然我没有看到重复每个单词重复数据的重要性(如果你想打印每个单词并且全部嵌套,那么计数将不会在一个句子内发生变化)您可以在打印时添加内部循环:
for i, v in enumerate(sentences_counter):
print("Sentence #{}:".format(i+1))
for word, count in v.items():
print("\t{} {}".format(word, count))
print("\n".join("\t\t{}: {}".format(w, c) for w, c in v.items() if w != word))
哪个会给你:
Sentence #1: area 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 drainage-basin 1 area: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 great 1 area: 1 drainage-basin: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 combined 1 area: 1 drainage-basin: 1 great: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 areas 1 area: 1 drainage-basin: 1 great: 1 combined: 1 england: 1 wales: 1 wide: 1 region: 1 fertile: 1 england 1 area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 wales: 1 wide: 1 region: 1 fertile: 1 wales 1 area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wide: 1 region: 1 fertile: 1 wide 1 area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 region: 1 fertile: 1 region 1 area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 fertile: 1 fertile 1 area: 1 drainage-basin: 1 great: 1 combined: 1 areas: 1 england: 1 wales: 1 wide: 1 region: 1 Sentence #2: mississippi 1 valley: 1 proper: 1 exceptionally: 1 valley 1 mississippi: 1 proper: 1 exceptionally: 1 proper 1 mississippi: 1 valley: 1 exceptionally: 1 exceptionally 1 mississippi: 1 valley: 1 proper: 1
随意删除句号打印并减少其中一个标签缩进,以便从您的问题中获得更多所需的输出。您还可以构建一个类似树的字典,而不是将所有内容打印到STDOUT,如果这更像您想要的那样。
更新2 :如果需要,您不必为set
使用common_words
。在这种情况下,它几乎可与list
互换,因此您可以使用list comprehension代替set comprehension(即用方括号替换卷曲),但查看list
是{ {1}}操作O(n)
查询是set
操作,因此此处首选O(1)
。更不用说set
有重复词的情况下自动重复数据删除的附带好处。
至于collections.defaultdict()
它只是为了节省一些编码/检查,只要它被请求就自动将字典初始化为一个键 - 没有它你就必须手动完成:
common.txt
更新3 :如果您只想在上次更新问题时看到所有句子中的原始单词列表,您甚至不需要自己考虑句子 - 只需添加一个点到interpunction列表,逐行读取文件,在空格上拆分并像以前一样计算单词:
with open("common.txt", "r") as f: # open the `common.txt` for reading
common_words = {l.strip().lower() for l in f} # read each line and and add it to a set
interpunction = ";,'\"" # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))
sentences_counter = [] # a list to hold a word count for each sentence
with open("sample.txt", "r") as f: # open the `sample.txt` for reading
# read the whole file to include linebreaks and split on `.` to get individual sentences
sentences = [s for s in f.read().split(".") if s.strip()] # ignore empty sentences
for sentence in sentences: # iterate over each sentence
sentence = sentence.translate(trans_table) # replace the interpunction with spaces
word_counter = {} # initialize a word counting dictionary
for word in sentence.split(): # split the sentence and iterate over the words
word = word.lower() # turn the word to lowercase
if word not in common_words: # count only words not in the common.txt
word_counter[word] = word_counter.get(word, 0) + 1 # increase the last count
sentences_counter.append(word_counter) # add the current sentence word count
答案 1 :(得分:-1)
希望下面的代码以您需要的方式工作
file = ('sample.txt', 'r')
file_1 = ('common.txt', 'r')
dict= {}
Orginal_data = file.read().split()
data=Orginal_data.lower()
Common_data = file_1.read(). split ()
C_data=Common_data.lower()
for char in ',;\n':
data = data.replace(char,' ')
for i in data:
Value=0
for j in C_data:
if i != j:
Not_Equal=1
If(Not_Equal==1):
for k in data:
if i ==k:
dict={ i : Value } # This line helps to count the appearance
Value+=1
print dict