一开始,我自己写的脚本能运行,但慢到怀疑人生。吃了个饭,折腾了半个小时后,字母表才处理到b而已,显然这是个失败的操作。我的做法是常规地为词汇表建立字典,然后历遍字典里的每个单词,单词进入函数后跟字典的另一个单词比较,比较方法是把单词(即字符串)打散为字符列表然后排列,如果排列一致,且被比较的单词小于拿去比的单词,它们就是一伙的,贴在被比较的单词列表下。列表长度大于2就返回列表然后打印。这样是可以选出异构词的,但非常非常慢!
看过参考答案之后我跳起来了,他们用了一句”.join(lists),这等于是把列表str重新粘成一个字符串,我那个去!他们把单词用列表打散重排再粘回去,最关键的是,这个唯一的重排字符串他们在建立字典的时候就作为key,所有与之有一样字符的全部被看作小弟被放置这个键的键值里。字典还是字典,但字典的键成了规则字符串,键值则是排列组合过的词汇表。我根本没想到啊,怎么可能想得到呢!!!!!
题目要求倒序打印,然后要求找出能组成最多异构词的8个字母。但实际上参考答案的输出问非所答,比如没有倒序,比如只是把8个字母的异构词摆出来,没确切告诉你最多的是什么。
Exercise 2: More anagrams! Write a program that reads a word list from a file (see Section 9.1) and prints all the sets of words that are anagrams. Here is an example of what the output might look like:
[‘deltas’, ‘desalt’, ‘lasted’, ‘salted’, ‘slated’, ‘staled’]
[‘retainers’, ‘ternaries’]
[‘generating’, ‘greatening’]
[‘resmelts’, ‘smelters’, ‘termless’]
Hint: you might want to build a dictionary that maps from a collection of letters to a list of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key? Modify the previous program so that it prints the longest list of anagrams first, followed by the second longest, and so on. In Scrabble a “bingo” is when you play all seven tiles in your rack, along with a letter on the board, to form an eight-letter word. What collection of 8 letters forms the most possible bingos? Solution: http://thinkpython2.com/code/anagram_sets.py.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| from time import time
def sorted_anagram(d):
l = []
for key in d:
if len(d[key]) > 1:
l.append((len(d[key]), d[key])) # 这是个由列表创建的元组?
return sorted(l, reverse = True) # 倒序神马真折腾
def eight_letters(d, num):
global length # 全局变量都用上了,就为了记录个最大值
new_l = []
for key in d:
if len(key) == num and len(d[key]) > 1:
new_l.append((len(d[key]), d[key]))
if len(d[key]) >= length:
length = len(d[key])
return sorted(new_l)
def sorted_letters(word):
list_word = sorted(list(word)) # 先把字符串打散为字符列表,然后排序
reword =''.join(list_word) # 再把字符列表回粘成字符串
return reword
def set_dict(fin):
d = {}
for line in fin:
word = line.strip()
reword = sorted_letters(word) # 打散重排相当关键,必须在建立字典时就做!!!
if reword not in d:
d[reword] = [word] # 字典的键已经不是单词,是纯粹的规律字符串
else:
d[reword].append(word) # 字典的键值才是词汇表里的单词
return d
fin = open('words.txt')
length = 0
count = 0
start = time()
d = set_dict(fin)
for item in sorted_anagram(d):
print(item)
count += 1
print(count)
for item in eight_letters(d, 8):
if item[0] == length:
print(item)
end = time()
print(end - start)
# ......
# (2, ['abacas', 'casaba'])
# (2, ['aba', 'baa'])
# (2, ['aals', 'alas'])
# (2, ['aal', 'ala'])
# (2, ['aahed', 'ahead'])
# (2, ['aah', 'aha'])
# 10157 # 全体异构词
# (7, ['angriest', 'astringe', 'ganister', 'gantries', 'granites', 'ingrates', 'rangiest'])
# 异构词最多的8字母单词(共7个异构词)
# 0.6079998016357422 |
from time import time
def sorted_anagram(d):
l = []
for key in d:
if len(d[key]) > 1:
l.append((len(d[key]), d[key])) # 这是个由列表创建的元组?
return sorted(l, reverse = True) # 倒序神马真折腾
def eight_letters(d, num):
global length # 全局变量都用上了,就为了记录个最大值
new_l = []
for key in d:
if len(key) == num and len(d[key]) > 1:
new_l.append((len(d[key]), d[key]))
if len(d[key]) >= length:
length = len(d[key])
return sorted(new_l)
def sorted_letters(word):
list_word = sorted(list(word)) # 先把字符串打散为字符列表,然后排序
reword =''.join(list_word) # 再把字符列表回粘成字符串
return reword
def set_dict(fin):
d = {}
for line in fin:
word = line.strip()
reword = sorted_letters(word) # 打散重排相当关键,必须在建立字典时就做!!!
if reword not in d:
d[reword] = [word] # 字典的键已经不是单词,是纯粹的规律字符串
else:
d[reword].append(word) # 字典的键值才是词汇表里的单词
return d
fin = open('words.txt')
length = 0
count = 0
start = time()
d = set_dict(fin)
for item in sorted_anagram(d):
print(item)
count += 1
print(count)
for item in eight_letters(d, 8):
if item[0] == length:
print(item)
end = time()
print(end - start)
# ......
# (2, ['abacas', 'casaba'])
# (2, ['aba', 'baa'])
# (2, ['aals', 'alas'])
# (2, ['aal', 'ala'])
# (2, ['aahed', 'ahead'])
# (2, ['aah', 'aha'])
# 10157 # 全体异构词
# (7, ['angriest', 'astringe', 'ganister', 'gantries', 'granites', 'ingrates', 'rangiest'])
# 异构词最多的8字母单词(共7个异构词)
# 0.6079998016357422