X Tutup
Skip to content

Commit 9daecc2

Browse files
committed
really done now
1 parent 9cea0da commit 9daecc2

File tree

2 files changed

+91
-125
lines changed

2 files changed

+91
-125
lines changed

book.md

Lines changed: 91 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,134 +4168,100 @@ And them I modified `make test` to include `rhymer.py` in the list of files to t
41684168
Some of the words in my system dictionary don't have vowels, so some of methods that assumed the presence of a vowel failed. Writing a test just for this one function really helped me find errors in my code.
41694169
\newpage
41704170

4171-
## Solution
4171+
## Discussion
4172+
4173+
The first thing to check is that the given word contains a vowel which is simple enough if you use regular expressions. We'll include "y" for this purpose:
41724174

41734175
````
4174-
1 #!/usr/bin/env python3
4175-
2 """Find rhyming words using the Soundex"""
4176-
3
4177-
4 import argparse
4178-
5 import re
4179-
6 import string
4180-
7 import soundex
4181-
8
4182-
9
4183-
10 # --------------------------------------------------
4184-
11 def get_args():
4185-
12 """get command-line arguments"""
4186-
13
4187-
14 parser = argparse.ArgumentParser(
4188-
15 description='Find rhyming words using the Soundex',
4189-
16 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
4190-
17
4191-
18 parser.add_argument('word', metavar='str', help='Word')
4192-
19
4193-
20 parser.add_argument('-w',
4194-
21 '--wordlist',
4195-
22 metavar='str',
4196-
23 help='Wordlist',
4197-
24 default='/usr/share/dict/words')
4198-
25
4199-
26 parser.add_argument('-s',
4200-
27 '--stem',
4201-
28 help='Stem the word (remove starting consonants',
4202-
29 action='store_true')
4203-
30
4204-
31 args = parser.parse_args()
4205-
32
4206-
33 if not any([c in 'aeiouy' for c in args.word]):
4207-
34 msg = 'word "{}" must contain at least one vowel'
4208-
35 parser.error(msg.format(args.word))
4209-
36
4210-
37 return args
4211-
38
4212-
39
4213-
40 # --------------------------------------------------
4214-
41 def stemmer(s: str, stem: bool) -> str:
4215-
42 """Use regular expressions"""
4216-
43
4217-
44 if stem:
4218-
45 match = re.search(r'^[^aeiou]+([aeiou].*)', s, re.IGNORECASE)
4219-
46 return match.group(1) if match else s
4220-
47 return s
4221-
48
4222-
49
4223-
50 # --------------------------------------------------
4224-
51 # def stemmer(s: str, stem: bool) -> str:
4225-
52 # """Manually `find` first vowel"""
4226-
53
4227-
54 # if stem:
4228-
55 # positions = list(
4229-
56 # filter(lambda p: p >= 0, [s.lower().find(v) for v in 'aeiou']))
4230-
57 # if positions:
4231-
58 # first = min(positions)
4232-
59 # return s[first:] if first else s
4233-
60 # return s
4234-
61
4235-
62 # --------------------------------------------------
4236-
63 # def stemmer(s: str, stem: bool) -> str:
4237-
64 # """Manually find first vowel with generator/next"""
4238-
65
4239-
66 # if stem:
4240-
67 # first = next(
4241-
68 # (t[0] for t in enumerate(s) if t[1].lower() in 'aeiou'), False)
4242-
69 # return s[first:] if first else s
4243-
70 # return s
4244-
71
4245-
72
4246-
73 # --------------------------------------------------
4247-
74 def test_stemmer():
4248-
75 """test stemmer"""
4249-
76
4250-
77 assert stemmer('listen', True) == 'isten'
4251-
78 assert stemmer('listen', False) == 'listen'
4252-
79 assert stemmer('chair', True) == 'air'
4253-
80 assert stemmer('chair', False) == 'chair'
4254-
81 assert stemmer('apple', True) == 'apple'
4255-
82 assert stemmer('apple', False) == 'apple'
4256-
83 assert stemmer('xxxxxx', True) == 'xxxxxx'
4257-
84 assert stemmer('xxxxxx', False) == 'xxxxxx'
4258-
85
4259-
86 assert stemmer('LISTEN', True) == 'ISTEN'
4260-
87 assert stemmer('LISTEN', False) == 'LISTEN'
4261-
88 assert stemmer('CHAIR', True) == 'AIR'
4262-
89 assert stemmer('CHAIR', False) == 'CHAIR'
4263-
90 assert stemmer('APPLE', True) == 'APPLE'
4264-
91 assert stemmer('APPLE', False) == 'APPLE'
4265-
92 assert stemmer('XXXXXX', True) == 'XXXXXX'
4266-
93 assert stemmer('XXXXXX', False) == 'XXXXXX'
4267-
94
4268-
95
4269-
96 # --------------------------------------------------
4270-
97 def main():
4271-
98 """Make a jazz noise here"""
4272-
99
4273-
100 args = get_args()
4274-
101 given = args.word
4275-
102 wordlist = args.wordlist
4276-
103 stem = args.stem
4277-
104 sndx = soundex.Soundex()
4278-
105 wanted = sndx.soundex(stemmer(given, stem))
4279-
106
4280-
107 # for word in open(wordlist).read().split():
4281-
108 # if given != word and sndx.soundex(stemmer(word, stem)) == wanted:
4282-
109 # print(word)
4283-
110
4284-
111 # print('\n'.join(
4285-
112 # filter(
4286-
113 # lambda w: given != w and sndx.soundex(stemmer(w, stem)) == wanted,
4287-
114 # open(wordlist).read().split())))
4288-
115
4289-
116 print('\n'.join([
4290-
117 word for word in open(wordlist).read().split()
4291-
118 if given != word and sndx.soundex(stemmer(word, stem)) == wanted
4292-
119 ]))
4293-
120
4294-
121
4295-
122 # --------------------------------------------------
4296-
123 if __name__ == '__main__':
4297-
124 main()
4176+
>>> re.search('[aeiouy]', 'YYZ', re.IGNORECASE) or 'Fail'
4177+
<re.Match object; span=(0, 1), match='Y'>
4178+
>>> re.search('[aeiouy]', 'bbbb', re.IGNORECASE) or 'Fail'
4179+
'Fail'
4180+
````
4181+
4182+
Another way that doesn't use a regex could use a list comprehension to iterate over character in the given word to see if it is `in` the `list` of vowels 'aeiouy':
4183+
4184+
````
4185+
>>> [c in 'aeiouy' for c in 'CAT'.lower()]
4186+
[False, True, False]
4187+
````
4188+
4189+
You can then ask if `any` of these tests are true:
4190+
4191+
````
4192+
>>> any([c in 'aeiouy' for c in 'CAT'.lower()])
4193+
True
4194+
>>> any([c in 'aeiouy' for c in 'BCD'.lower()])
4195+
False
4196+
````
4197+
4198+
By far the regex version is simpler, but it's always interesting to think about other ways to accomplish a task. Anyway, if the given `word` does not have a vowel, I throw a `parser.error`.
4199+
4200+
## Using Soundex
4201+
4202+
The `soundex` module has you create a `Soundex` object and then call a `soundex` function, which all seems a bit repetitive. Still, it gives us a way to get a Soundex value for a given word:
4203+
4204+
````
4205+
>>> from soundex import Soundex
4206+
>>> sndx = Soundex()
4207+
>>> sndx.soundex('paper')
4208+
'p16'
4209+
````
4210+
4211+
The problem is that sometimes we want the stemmed version of the word:
4212+
4213+
````
4214+
>>> sndx.soundex('aper')
4215+
'a16'
4216+
````
4217+
4218+
So I wrote a `stemmer` function that does (or does not) stem the word using the value of the `--stem` option which I defined in `argparse` as a Boolean value. I tried to find a way to remove leading consonants both with and without regular expressions. The regex version builds a somewhat complicated regex. Let's start with how to match something at the start of a string that is *not* a vowel (again, because there are only 5 to list):
4219+
42984220
````
4221+
>>> import re
4222+
>>> re.search(r'^[^aeiou]+', 'chair')
4223+
<re.Match object; span=(0, 2), match='ch'>
4224+
````
4225+
4226+
So we saw earlier that `[aeiou]` is the character class that matches vowels, so we can *negate* the class with `^` **inside** the character class. It's a bit confusing because there is also a `^` at the beginning of the `r''` (raw) string that anchors the expression to the beginning of the string.
4227+
4228+
OK, so that find the non-vowels leading the word, but we want the bit afterwards. It seems like we could just write something like this:
4229+
4230+
````
4231+
>>> re.search(r'^[^aeiou]+(.+)$', 'chr')
4232+
<re.Match object; span=(0, 3), match='chr'>
4233+
````
4234+
4235+
Which seems to say "one or more non-vowels followed by one or more of anything" and it looks to work, but look further:
4236+
4237+
````
4238+
>>> re.search(r'^[^aeiou]+(.+)$', 'chr').groups()
4239+
('r',)
4240+
````
4241+
4242+
It finds the last `r`. We need to specify that after the non-vowels there needs to be at least one vowel:
4243+
4244+
````
4245+
>>> re.search(r'^[^aeiou]+([aeiou].*)', 'chr')
4246+
````
4247+
4248+
And now it works:
4249+
4250+
````
4251+
>>> re.search(r'^[^aeiou]+([aeiou].*)', 'chr')
4252+
>>> re.search(r'^[^aeiou]+([aeiou].*)', 'car')
4253+
<re.Match object; span=(0, 3), match='car'>
4254+
>>> re.search(r'^[^aeiou]+([aeiou].*)', 'car').groups()
4255+
('ar',)
4256+
````
4257+
4258+
So the `stemmer` works by first looking to see if we should even attempt to `stem`. If so, it attempts to match the regular expression. If that succeeds, then it returns the match. The `else` for everything is to return the original string `s`.
4259+
4260+
The two other versions of `stemmer` rely on some things I'll discuss later.
4261+
4262+
As stated in the intro, it was most helpful to me to add the `test_stemmer` function to ensure that all my versions of the `stemmer` function actually had the same behavior.
4263+
4264+
Once I have the `stemmer` function, I can apply it to the given `word` and every word in the `--wordlist` and then call the ``
42994265

43004266
\newpage
43014267

playful_python.pdf

1.08 KB
Binary file not shown.

0 commit comments

Comments
 (0)
X Tutup