You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -4168,134 +4168,100 @@ And them I modified `make test` to include `rhymer.py` in the list of files to t
4168
4168
Some of the words in my system dictionary don't have vowels, so some of methods that assumed the presence of a vowel failed. Writing a test just for this one function really helped me find errors in my code.
4169
4169
\newpage
4170
4170
4171
-
## Solution
4171
+
## Discussion
4172
+
4173
+
The first thing to check is that the given word contains a vowel which is simple enough if you use regular expressions. We'll include "y" for this purpose:
>>> re.search('[aeiouy]', 'YYZ', re.IGNORECASE) or 'Fail'
4177
+
<re.Match object; span=(0, 1), match='Y'>
4178
+
>>> re.search('[aeiouy]', 'bbbb', re.IGNORECASE) or 'Fail'
4179
+
'Fail'
4180
+
````
4181
+
4182
+
Another way that doesn't use a regex could use a list comprehension to iterate over character in the given word to see if it is `in` the `list` of vowels 'aeiouy':
4183
+
4184
+
````
4185
+
>>> [c in 'aeiouy' for c in 'CAT'.lower()]
4186
+
[False, True, False]
4187
+
````
4188
+
4189
+
You can then ask if `any` of these tests are true:
4190
+
4191
+
````
4192
+
>>> any([c in 'aeiouy' for c in 'CAT'.lower()])
4193
+
True
4194
+
>>> any([c in 'aeiouy' for c in 'BCD'.lower()])
4195
+
False
4196
+
````
4197
+
4198
+
By far the regex version is simpler, but it's always interesting to think about other ways to accomplish a task. Anyway, if the given `word` does not have a vowel, I throw a `parser.error`.
4199
+
4200
+
## Using Soundex
4201
+
4202
+
The `soundex` module has you create a `Soundex` object and then call a `soundex` function, which all seems a bit repetitive. Still, it gives us a way to get a Soundex value for a given word:
4203
+
4204
+
````
4205
+
>>> from soundex import Soundex
4206
+
>>> sndx = Soundex()
4207
+
>>> sndx.soundex('paper')
4208
+
'p16'
4209
+
````
4210
+
4211
+
The problem is that sometimes we want the stemmed version of the word:
4212
+
4213
+
````
4214
+
>>> sndx.soundex('aper')
4215
+
'a16'
4216
+
````
4217
+
4218
+
So I wrote a `stemmer` function that does (or does not) stem the word using the value of the `--stem` option which I defined in `argparse` as a Boolean value. I tried to find a way to remove leading consonants both with and without regular expressions. The regex version builds a somewhat complicated regex. Let's start with how to match something at the start of a string that is *not* a vowel (again, because there are only 5 to list):
4219
+
4298
4220
````
4221
+
>>> import re
4222
+
>>> re.search(r'^[^aeiou]+', 'chair')
4223
+
<re.Match object; span=(0, 2), match='ch'>
4224
+
````
4225
+
4226
+
So we saw earlier that `[aeiou]` is the character class that matches vowels, so we can *negate* the class with `^`**inside** the character class. It's a bit confusing because there is also a `^` at the beginning of the `r''` (raw) string that anchors the expression to the beginning of the string.
4227
+
4228
+
OK, so that find the non-vowels leading the word, but we want the bit afterwards. It seems like we could just write something like this:
4229
+
4230
+
````
4231
+
>>> re.search(r'^[^aeiou]+(.+)$', 'chr')
4232
+
<re.Match object; span=(0, 3), match='chr'>
4233
+
````
4234
+
4235
+
Which seems to say "one or more non-vowels followed by one or more of anything" and it looks to work, but look further:
4236
+
4237
+
````
4238
+
>>> re.search(r'^[^aeiou]+(.+)$', 'chr').groups()
4239
+
('r',)
4240
+
````
4241
+
4242
+
It finds the last `r`. We need to specify that after the non-vowels there needs to be at least one vowel:
So the `stemmer` works by first looking to see if we should even attempt to `stem`. If so, it attempts to match the regular expression. If that succeeds, then it returns the match. The `else` for everything is to return the original string `s`.
4259
+
4260
+
The two other versions of `stemmer` rely on some things I'll discuss later.
4261
+
4262
+
As stated in the intro, it was most helpful to me to add the `test_stemmer` function to ensure that all my versions of the `stemmer` function actually had the same behavior.
4263
+
4264
+
Once I have the `stemmer` function, I can apply it to the given `word` and every word in the `--wordlist` and then call the ``
0 commit comments