字典还能这样玩!
一开始,我自己写的脚本能运行,但慢到怀疑人生。吃了个饭,折腾了半个小时后,字母表才处理到b而已,显然这是个失败的操作。我的做法是常规地为词汇表建立字典,然后历遍字典里的每个单词,单词进入函数后跟字典的另一个单词比较,比较方法是把单词(即字符串)打散为字符列表然后排列,如果排列一致,且被比较的单词小于拿去比的单词,它们就是一伙的,贴在被比较的单词列表下。列表长度大于2就返回列表然后打印。这样是可以选出异构词的,但非常非常慢!
看过参考答案之后我跳起来了,他们用了一句”.join(lists),这等于是把列表str重新粘成一个字符串,我那个去!他们把单词用列表打散重排再粘回去,最关键的是,这个唯一的重排字符串他们在建立字典的时候就作为key,所有与之有一样字符的全部被看作小弟被放置这个键的键值里。字典还是字典,但字典的键成了规则字符串,键值则是排列组合过的词汇表。我根本没想到啊,怎么可能想得到呢!!!!!
题目要求倒序打印,然后要求找出能组成最多异构词的8个字母。但实际上参考答案的输出问非所答,比如没有倒序,比如只是把8个字母的异构词摆出来,没确切告诉你最多的是什么。
Exercise 2: More anagrams! Write a program that reads a word list from a file (see Section 9.1) and prints all the sets of words that are anagrams. Here is an example of what the output might look like:
[‘deltas’, ‘desalt’, ‘lasted’, ‘salted’, ‘slated’, ‘staled’]
[‘retainers’, ‘ternaries’]
[‘generating’, ‘greatening’]
[‘resmelts’, ‘smelters’, ‘termless’]
Hint: you might want to build a dictionary that maps from a collection of letters to a list of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key? Modify the previous program so that it prints the longest list of anagrams first, followed by the second longest, and so on. In Scrabble a “bingo” is when you play all seven tiles in your rack, along with a letter on the board, to form an eight-letter word. What collection of 8 letters forms the most possible bingos? Solution: http://thinkpython2.com/code/anagram_sets.py.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | from time import time def sorted_anagram(d): l = [] for key in d: if len(d[key]) > 1: l.append((len(d[key]), d[key])) # 这是个由列表创建的元组? return sorted(l, reverse = True) # 倒序神马真折腾 def eight_letters(d, num): global length # 全局变量都用上了,就为了记录个最大值 new_l = [] for key in d: if len(key) == num and len(d[key]) > 1: new_l.append((len(d[key]), d[key])) if len(d[key]) >= length: length = len(d[key]) return sorted(new_l) def sorted_letters(word): list_word = sorted(list(word)) # 先把字符串打散为字符列表,然后排序 reword =''.join(list_word) # 再把字符列表回粘成字符串 return reword def set_dict(fin): d = {} for line in fin: word = line.strip() reword = sorted_letters(word) # 打散重排相当关键,必须在建立字典时就做!!! if reword not in d: d[reword] = [word] # 字典的键已经不是单词,是纯粹的规律字符串 else: d[reword].append(word) # 字典的键值才是词汇表里的单词 return d fin = open('words.txt') length = 0 count = 0 start = time() d = set_dict(fin) for item in sorted_anagram(d): print(item) count += 1 print(count) for item in eight_letters(d, 8): if item[0] == length: print(item) end = time() print(end - start) # ...... # (2, ['abacas', 'casaba']) # (2, ['aba', 'baa']) # (2, ['aals', 'alas']) # (2, ['aal', 'ala']) # (2, ['aahed', 'ahead']) # (2, ['aah', 'aha']) # 10157 # 全体异构词 # (7, ['angriest', 'astringe', 'ganister', 'gantries', 'granites', 'ingrates', 'rangiest']) # 异构词最多的8字母单词(共7个异构词) # 0.6079998016357422 |