如何计算字典中单词的词频?

时间:2018-05-14 16:12:28

标签: python dictionary

我有一个如下字典:

[{'mississippi': 1, 'worth': 1, 'reading': 1}, {'commonplace': 1, 'river': 1, 'contrary': 1, 'ways': 1, 'remarkable': 1}, {'considering': 1, 'missouri': 1, 'main': 1, 'branch': 1, 'longest': 1, 'river': 1, 'world--four': 1}, {'seems': 1, 'safe': 1, 'crookedest': 1, 'river': 1, 'part': 1, 'journey': 1, 'uses': 1, 'cover': 1, 'ground': 1, 'crow': 1, 'fly': 1, 'six': 1, 'seventy-five': 1}, {'discharges': 1, 'water': 1, 'st': 1}, {'lawrence': 1, 'twenty-five': 1, 'rhine': 1, 'thirty-eight': 1, 'thames': 1}, {'river': 1, 'vast': 1, 'drainage-basin:': 1, 'draws': 1, 'water': 1, 'supply': 1, 'twenty-eight': 1, 'states': 1, 'territories': 1, 'delaware': 1, 'atlantic': 1, 'seaboard': 1, 'country': 1, 'idaho': 1, 'pacific': 1, 'slope--a': 1, 'spread': 1, 'forty-five': 1, 'degrees': 1, 'longitude': 1}, {'mississippi': 1, 'receives': 1, 'carries': 1, 'gulf': 1, 'water': 1, 'fifty-four': 1, 'subordinate': 1, 'rivers': 1, 'navigable': 1, 'steamboats': 1, 'hundreds': 1, 'flats': 1, 'keels': 1}, {'area': 1, 'drainage-basin': 1, 'combined': 1, 'areas': 1, 'england': 1, 'wales': 1, 'scotland': 1, 'ireland': 1, 'france': 1, 'spain': 1, 'portugal': 1, 'germany': 1, 'austria': 1, 'italy': 1, 'turkey': 1, 'almost': 1, 'wide': 1, 'region': 1, 'fertile': 1, 'mississippi': 1, 'valley': 1, 'proper': 1, 'exceptionally': 1}]

我想将其更改为我想要的输出,如下所示,计算两个目标词之间的相似度得分:

river 4
    ground: 1
    journey: 1
    longitude: 1
    main: 1
    world--four: 1
    contrary: 1
    cover: 1
    delaware: 1
    remarkable: 1
    vast: 1
    forty-five: 1
    crookedest: 1
    territories: 1
    spread: 1
    country: 1
    longest: 1
    fly: 1
    atlantic: 1
    crow: 1
    supply: 1
    seems: 1
    idaho: 1
    seaboard: 1
    states: 1
    ways: 1
    degrees: 1
    part: 1
    twenty-eight: 1
    pacific: 1
    branch: 1
    water: 1
    considering: 1
    six: 1
    safe: 1
    commonplace: 1
    draws: 1
    drainage-basin: 1
    uses: 1
    seventy-five: 1
    slope--a: 1
    missouri: 1
mississippi 3
    area: 1
    steamboats: 1
    germany: 1
    reading: 1
    france: 1
    proper: 1
    fifty-four: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    carries: 1
    combined: 1
    flats: 1
    receives: 1
    england: 1
    italy: 1
    scotland: 1
    wales: 1
    almost: 1
    navigable: 1
    austria: 1
    region: 1
    wide: 1
    spain: 1
    subordinate: 1
    drainage-basin: 1
    hundreds: 1
    keels: 1
    portugal: 1
    water: 1
    gulf: 1
    ireland: 1
    rivers: 1
    valley: 1
    fertile: 1
    worth: 1
water 3
    steamboats: 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    fifty-four: 1
    pacific: 1
    vast: 1
    subordinate: 1
    carries: 1
    keels: 1
    flats: 1
    supply: 1
    receives: 1
    atlantic: 1
    forty-five: 1
    river: 1
    rivers: 1
    idaho: 1
    mississippi: 1
    seaboard: 1
    navigable: 1
    discharges: 1
    degrees: 1
    twenty-eight: 1
    drainage-basin: 1
    hundreds: 1
    st: 1
    gulf: 1
    draws: 1
    delaware: 1
    territories: 1
    slope--a: 1
drainage-basin 2
    area: 1
    spread: 1
    country: 1
    states: 1
    mississippi: 1
    longitude: 1
    france: 1
    proper: 1
    vast: 1
    turkey: 1
    forty-five: 1
    areas: 1
    combined: 1
    germany: 1
    exceptionally: 1
    valley: 1
    supply: 1
    fertile: 1
    atlantic: 1
    italy: 1
    river: 1
    idaho: 1
    wales: 1
    almost: 1
    seaboard: 1
    spain: 1
    austria: 1
    region: 1
    degrees: 1
    twenty-eight: 1
    wide: 1
    england: 1
    portugal: 1
    water: 1
    ireland: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    scotland: 1
    slope--a: 1
area 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
journey 1
    ground: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
seems 1
    ground: 1
    journey: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
states 1
    spread: 1
    country: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
slope--a 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
remarkable 1
    contrary: 1
    river: 1
    commonplace: 1
    ways: 1
vast 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    pacific: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
forty-five 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    pacific: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
crookedest 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
carries 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
germany 1
    area: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
longest 1
    main: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
    considering: 1
flats 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    rivers: 1
    receives: 1
supply 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
receives 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
crow 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
scotland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    spain: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
country 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
thames 1
    thirty-eight: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
england 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
navigable 1
    mississippi: 1
    steamboats: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
austria 1
    area: 1
    germany: 1
    mississippi: 1
    france: 1
    proper: 1
    region: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    exceptionally: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
rhine 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    twenty-five: 1
part 1
    ground: 1
    journey: 1
    seems: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
twenty-eight 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
branch 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    missouri: 1
    considering: 1
hundreds 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
st 1
    water: 1
    discharges: 1
considering 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
six 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    fly: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
gulf 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    flats: 1
    rivers: 1
    receives: 1
ireland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    valley: 1
safe 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
commonplace 1
    contrary: 1
    river: 1
    remarkable: 1
    ways: 1
draws 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    supply: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
delaware 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
thirty-eight 1
    thames: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
longitude 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
world--four 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
lawrence 1
    thirty-eight: 1
    thames: 1
    rhine: 1
    twenty-five: 1
ground 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
steamboats 1
    mississippi: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
spread 1
    seaboard: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
idaho 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
reading 1
    mississippi: 1
    worth: 1
almost 1
    area: 1
    germany: 1
    austria: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    mississippi: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
contrary 1
    river: 1
    remarkable: 1
    commonplace: 1
    ways: 1
cover 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
france 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
spain 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
pacific 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
turkey 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
fifty-four 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    hundreds: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
subordinate 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
territories 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    supply: 1
    atlantic: 1
    slope--a: 1
    river: 1
    country: 1
combined 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
exceptionally 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
region 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
twenty-five 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    rhine: 1
rivers 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    receives: 1
fly 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
atlantic 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    river: 1
    supply: 1
    twenty-eight: 1
    idaho: 1
italy 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
main 1
    world--four: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
areas 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
seaboard 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
fertile 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
ways 1
    contrary: 1
    river: 1
    remarkable: 1
    commonplace: 1
discharges 1
    water: 1
    st: 1
degrees 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
wide 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
proper 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
keels 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    water: 1
    fifty-four: 1
    hundreds: 1
    subordinate: 1
    carries: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
portugal 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    ireland: 1
    valley: 1
worth 1
    mississippi: 1
    reading: 1
uses 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    fly: 1
    seventy-five: 1
    river: 1
seventy-five 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    river: 1
    fly: 1
valley 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
missouri 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    world--four: 1
    considering: 1
wales 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1

第一行是整个词典中的目标词及其频率。下面是相关单词及其在同一句子中与目标单词的频率。与第一本字典一样,与" mississippi"相关联的个人资料将包含对"值得"的引用。和"阅读"并且他们在句子中的单词频率是1,但密西西比的单词频率在整个字典中是3。我想按降序对目标字的单词频率进行排序。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

既不是你想要的输出也不是你的代码也不是很清楚你究竟想要实现什么,但是如果只计算单个句子中的单词那么策略应该是:

  1. common.txt读入set进行快速查找。
  2. 阅读sample.txt并在.上拆分以获得单独的句子。
  3. 清除所有非单词字符(您必须定义它们或使用正则表达式\b来捕获字边界)并用空格替换它们。
  4. 在空格上拆分并计算步骤1 set中不存在的字词。
  5. 所以:

    import collections
    
    with open("common.txt", "r") as f:  # open the `common.txt` for reading
        common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set
    
    interpunction = ";,'\""  # define word separating characters and create a translation table
    trans_table = str.maketrans(interpunction, " " * len(interpunction))
    
    sentences_counter = []  # a list to hold a word count for each sentence
    with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
        # read the whole file to include linebreaks and split on `.` to get individual sentences
        sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
        for sentence in sentences:  # iterate over each sentence
            sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
            word_counter = collections.defaultdict(int)  # a string:int default dict for counting
            for word in sentence.split():  # split the sentence and iterate over the words
                if word.lower() not in common_words:  # count only words not in the common.txt
                    word_counter[word.lower()] += 1
            sentences_counter.append(word_counter)  # add the current sentence word count
    

    注意:在Python 2.x上使用string.maketrans()代替str.maketrans()

    这将生成sentences_counter,其中包含sample.txt中每个句子的字典计数,其中键是实际单词,其关联值是单词计数。您可以将结果打印为:

    for i, v in enumerate(sentences_counter):
        print("Sentence #{}:".format(i+1))
        print("\n".join("\t{}: {}".format(w, c) for w, c in v.items()))
    

    将产生(对于您的样本数据):

    Sentence #1:
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    Sentence #2:
        mississippi: 1
        valley: 1
        proper: 1
        exceptionally: 1

    请记住,(英语)语言比这更复杂 - 例如,“愤怒时,猫会摇摆尾巴,所以请远离“依赖于你对撇号的处理方式会有很大不同。此外,点不一定表示句子的结尾。如果你想进行严肃的语言分析,你应该研究NLP

    更新:虽然我没有看到重复每个单词重复数据的重要性(如果你想打印每个单词并且全部嵌套,那么计数将不会在一个句子内发生变化)您可以在打印时添加内部循环:

    for i, v in enumerate(sentences_counter):
        print("Sentence #{}:".format(i+1))
        for word, count in v.items():
            print("\t{} {}".format(word, count))
            print("\n".join("\t\t{}: {}".format(w, c) for w, c in v.items() if w != word))
    

    哪个会给你:

    Sentence #1:
        area 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        drainage-basin 1
            area: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        great 1
            area: 1
            drainage-basin: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        combined 1
            area: 1
            drainage-basin: 1
            great: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        areas 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        england 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            wales: 1
            wide: 1
            region: 1
            fertile: 1
        wales 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wide: 1
            region: 1
            fertile: 1
        wide 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            region: 1
            fertile: 1
        region 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            fertile: 1
        fertile 1
            area: 1
            drainage-basin: 1
            great: 1
            combined: 1
            areas: 1
            england: 1
            wales: 1
            wide: 1
            region: 1
    Sentence #2:
        mississippi 1
            valley: 1
            proper: 1
            exceptionally: 1
        valley 1
            mississippi: 1
            proper: 1
            exceptionally: 1
        proper 1
            mississippi: 1
            valley: 1
            exceptionally: 1
        exceptionally 1
            mississippi: 1
            valley: 1
            proper: 1

    随意删除句号打印并减少其中一个标签缩进,以便从您的问题中获得更多所需的输出。您还可以构建一个类似树的字典,而不是将所有内容打印到STDOUT,如果这更像您想要的那样。

    更新2 :如果需要,您不必为set使用common_words。在这种情况下,它几乎可与list互换,因此您可以使用list comprehension代替set comprehension(即用方括号替换卷曲),但查看list是{ {1}}操作O(n)查询是set操作,因此此处首选O(1)。更不用说set有重复词的情况下自动重复数据删除的附带好处。

    至于collections.defaultdict()它只是为了节省一些编码/检查,只要它被请求就自动将字典初始化为一个键 - 没有它你就必须手动完成:

    common.txt

    更新3 :如果您只想在上次更新问题时看到所有句子中的原始单词列表,您甚至不需要自己考虑句子 - 只需添加一个点到interpunction列表,逐行读取文件,在空格上拆分并像以前一样计算单词:

    with open("common.txt", "r") as f:  # open the `common.txt` for reading
        common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set
    
    interpunction = ";,'\""  # define word separating characters and create a translation table
    trans_table = str.maketrans(interpunction, " " * len(interpunction))
    
    sentences_counter = []  # a list to hold a word count for each sentence
    with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
        # read the whole file to include linebreaks and split on `.` to get individual sentences
        sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
        for sentence in sentences:  # iterate over each sentence
            sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
            word_counter = {}  # initialize a word counting dictionary
            for word in sentence.split():  # split the sentence and iterate over the words
                word = word.lower()  # turn the word to lowercase
                if word not in common_words:  # count only words not in the common.txt
                    word_counter[word] = word_counter.get(word, 0) + 1  # increase the last count
            sentences_counter.append(word_counter)  # add the current sentence word count
    

答案 1 :(得分:-1)

希望下面的代码以您需要的方式工作

file = ('sample.txt', 'r') 
file_1 = ('common.txt', 'r')
dict= {}
Orginal_data = file.read().split()
data=Orginal_data.lower() 
Common_data = file_1.read(). split ()
C_data=Common_data.lower()

for char in ',;\n': 
    data = data.replace(char,' ') 

for i in data:
     Value=0
     for j in C_data: 
          if i != j:
             Not_Equal=1
      If(Not_Equal==1):
          for k in data:
              if i ==k:
                  dict={ i : Value } # This line helps to count the appearance
                   Value+=1
print dict