forked from csev/py4e
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcfbook010.html
More file actions
622 lines (615 loc) · 26.6 KB
/
cfbook010.html
File metadata and controls
622 lines (615 loc) · 26.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="GENERATOR" content="hevea 1.07" />
<title>
Dictionaries
</title>
</head>
<body>
<a href="cfbook009.html"><img src="previous_motif.gif" alt="Previous" /></a>
<a href="index.html"><img src="contents_motif.gif" alt="Up" /></a>
<a href="cfbook011.html"><img src="next_motif.gif" alt="Next" /></a>
<hr />
<h1><font color="black"><a name="htoc116">Chapter 9</a> Dictionaries</font></h1>
<a name="@default584"></a>
<a name="@default585"></a>
<a name="@default586"></a>
<a name="@default587"></a>
<a name="@default588"></a>
<a name="@default589"></a>
<font color="black">A <b>dictionary</b> is like a list, but more general. In a list,
the positions (a.k.a. indices) have to be integers; in a dictionary
the indices can be (almost) any type.<br />
<br />
You can think of a dictionary as a mapping between a set of indices
(which are called <b>keys</b>) and a set of values. Each key maps to a
value. The association of a key and a value is called a <b>key-value pair</b> or sometimes an <b>item</b>.<br />
<br />
As an example, we'll build a dictionary that maps from English
to Spanish words, so the keys and the values are all strings.<br />
<br />
The function <tt>dict</tt> creates a new dictionary with no items.
Because <tt>dict</tt> is the name of a built-in function, you
should avoid using it as a variable name.</font><br />
<br />
<a name="@default590"></a>
<a name="@default591"></a>
<pre><font size="4" color="blue">
>>> eng2sp = dict()
>>> print eng2sp
{}
</font></pre>
<font color="black">The squiggly-brackets, <code>{}</code>, represent an empty dictionary.
To add items to the dictionary, you can use square brackets:</font><br />
<br />
<a name="@default592"></a>
<a name="@default593"></a>
<pre><font size="4" color="blue">
>>> eng2sp['one'] = 'uno'
</font></pre><font color="black">This line creates an item that maps from the key
<tt>'one'</tt> to the value <code>'uno'</code>. If we print the
dictionary again, we see a key-value pair with a colon
between the key and value:
</font><pre><font size="4" color="blue">
>>> print eng2sp
{'one': 'uno'}
</font></pre><font color="black">This output format is also an input format. For example,
you can create a new dictionary with three items:
</font><pre><font size="4" color="blue">
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
</font></pre><font color="black">But if you print <tt>eng2sp</tt>, you might be surprised:
</font><pre><font size="4" color="blue">
>>> print eng2sp
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
</font></pre><font color="black">The order of the key-value pairs is not the same. In fact, if
you type the same example on your computer, you might get a
different result. In general, the order of items in
a dictionary is unpredictable.<br />
<br />
But that's not a problem because
the elements of a dictionary are never indexed with integer indices.
Instead, you use the keys to look up the corresponding values:
</font><pre><font size="4" color="blue">
>>> print eng2sp['two']
'dos'
</font></pre><font color="black">The key <tt>'two'</tt> always maps to the value <code>'dos'</code> so the order
of the items doesn't matter.<br />
<br />
If the key isn't in the dictionary, you get an exception:</font><br />
<br />
<a name="@default594"></a>
<a name="@default595"></a>
<pre><font size="4" color="blue">
>>> print eng2sp['four']
KeyError: 'four'
</font></pre><font color="black">The <tt>len</tt> function works on dictionaries; it returns the
number of key-value pairs:</font><br />
<br />
<a name="@default596"></a>
<a name="@default597"></a>
<pre><font size="4" color="blue">
>>> len(eng2sp)
3
</font></pre><font color="black">The <tt>in</tt> operator works on dictionaries; it tells you whether
something appears as a <em>key</em> in the dictionary (appearing
as a value is not good enough).</font><br />
<br />
<a name="@default598"></a>
<a name="@default599"></a>
<a name="@default600"></a>
<pre><font size="4" color="blue">
>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False
</font></pre><font color="black">To see whether something appears as a value in a dictionary, you
can use the method <tt>values</tt>, which returns the values as
a list, and then use the <tt>in</tt> operator:</font><br />
<br />
<a name="@default601"></a>
<a name="@default602"></a>
<pre><font size="4" color="blue">
>>> vals = eng2sp.values()
>>> 'uno' in vals
True
</font></pre><font color="black">The <tt>in</tt> operator uses different algorithms for lists and
dictionaries. For lists, it uses a linear search algorithm.
As the list gets longer, the search time gets
longer in direct proportion to the length of the list.
For dictionaries, Python uses an
algorithm called a <b>hash table</b> that has a remarkable property; the
<tt>in</tt> operator takes about the same amount of time no matter how
many items there are in a dictionary. I won't explain
why hash functions are so magical,
but you can read more about it at
<tt>wikipedia.org/wiki/Hash_table</tt>.</font><br />
<br />
<a name="@default603"></a><br />
<div align="left"><font color="black"><b>Exercise 1</b>
<a name="wordlist2"></a><br />
<br />
<a name="@default604"></a>
<a name="@default605"></a><br />
<br />
<em>Write a program that reads the words in <tt>words.txt</tt> and
stores them as keys in a dictionary. It doesn't matter what the
values are. Then you can use the <tt>in</tt> operator
as a fast way to check whether a string is in
the dictionary.</em></font></div><br />
<a name="toc107"></a>
<h2><font color="black"><a name="htoc117">9.1</a> Dictionary as a set of counters</font></h2>
<a name="histogram"></a>
<a name="@default606"></a>
<font color="black">Suppose you are given a string and you want to count how many
times each letter appears. There are several ways you could do it:
</font><ol type="1"><li><font color="black">You could create 26 variables, one for each letter of the
alphabet. Then you could traverse the string and, for each
character, increment the corresponding counter, probably using
a chained conditional.</font><br />
<br />
</li><li><font color="black">You could create a list with 26 elements. Then you could
convert each character to a number (using the built-in function
<tt>ord</tt>), use the number as an index into the list, and increment
the appropriate counter.</font><br />
<br />
</li><li><font color="black">You could create a dictionary with characters as keys
and counters as the corresponding values. The first time you
see a character, you would add an item to the dictionary. After
that you would increment the value of an existing item.</font></li></ol>
<font color="black">Each of these options performs the same computation, but each
of them implements that computation in a different way.<br />
<br />
<a name="@default607"></a><br />
<br />
An <b>implementation</b> is a way of performing a computation;
some implementations are better than others. For example,
an advantage of the dictionary implementation is that we don't
have to know ahead of time which letters appear in the string
and we only have to make room for the letters that do appear.<br />
<br />
Here is what the code might look like:
</font><pre><font size="4" color="blue">
word = 'brontosaurus'
d = dict()
for c in word:
if c not in d:
d[c] = 1
else:
d[c] = d[c] + 1
print d
</font></pre><font color="black">We are effectively computing a <b>histogram</b>, which is a statistical
term for a set of counters (or frequencies). <br />
<br />
<a name="@default608"></a>
<a name="@default609"></a>
<a name="@default610"></a><br />
<br />
The <tt>for</tt> loop traverses
the string. Each time through the loop, if the character <tt>c</tt> is
not in the dictionary, we create a new item with key <tt>c</tt> and the
initial value 1 (since we have seen this letter once). If <tt>c</tt> is
already in the dictionary we increment <tt>d[c]</tt>.<br />
<br />
<a name="@default611"></a><br />
<br />
Here's the output of the program:
</font><pre><font size="4" color="blue">
{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}
</font></pre><font color="black">The histogram indicates that the letters <tt>'a'</tt> and <code>'b'</code>
appear once; <code>'o'</code> appears twice, and so on.<br />
<br />
<a name="@default612"></a>
<a name="@default613"></a><br />
<br />
Dictionaries have a method called <tt>get</tt> that takes a key
and a default value. If the key appears in the dictionary,
<tt>get</tt> returns the corresponding value; otherwise it returns
the default value. For example:
</font><pre><font size="4" color="blue">
>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print counts.get('jan', 0)
100
>>> print counts.get('tim', 0)
0
</font></pre><font color="black">We can use <tt>get</tt> to write our histogram loop more concisely.
Because the <tt>get</tt> method automatically handles the case where a key
is not in a dictionary, we can reduce four lines down to one
and eliminate the <tt>if</tt> statement.
</font><pre><font size="4" color="blue">
word = 'brontosaurus'
d = dict()
for c in word:
d[c] = d.get(c,0) + 1
print d
</font></pre><font color="black">The use of the <tt>get</tt> method to simplify this counting loop
ends up being a very commonly used "idiom" in Python and
we will use it many times the rest of the book. So you should
take a moment and compare the loop using the <tt>if</tt> statement
and <tt>in</tt> operator with the loop using the <tt>get</tt> method.
They do exactly the same thing, but one is more succinct.
</font><a name="@default614"></a><br />
<br />
<a name="toc108"></a>
<h2><font color="black"><a name="htoc118">9.2</a> Dictionaries and files</font></h2>
<font color="black">One of the common uses of a dictionary is to count the occurrence
of words in a file with some written text.
Let's start with a very simple file of
words taken from the text of <em>Romeo and Juliet</em>
thanks to
<tt>http://shakespeare.mit.edu/Tragedy/romeoandjuliet/romeo_juliet.2.2.html</tt>.<br />
<br />
For the first set of examples, we will use a shortened and simplified version
of the text with no punctuation. Later we will work with the text of the
scene with punctuation included.
</font><pre><font size="4" color="blue">
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
</font></pre><font color="black">We will write a Python program to read through the lines of the file,
break each line into a list of words, and then loop through each
of the words in the line, and count each word using a dictionary.<br />
<br />
<a name="@default615"></a>
<a name="@default616"></a>
You will see that we have two <tt>for</tt> loops. The outer loop is reading the
lines of the file and the inner loop is iterating through each
of the words on that particular line. This is an example
of a pattern called <b>nested loops</b> because one of the loops
is the <em>outer</em> loop and the other loop is the <em>inner</em>
loop. <br />
<br />
Because the inner loop executes all of its iterations each time
the outer loop makes a single iteration, we think of the inner
loop as iterating "more quickly" and the outer loop as iterating
more slowly.<br />
<br />
<a name="@default617"></a>
The combination of the two nested loops ensures that we will count
every word on every line of the input file.
</font><pre><font size="4" color="blue">
fname = raw_input('Enter the file name: ')
try:
fhand = open(fname)
except:
print 'File cannot be opened:', fname
exit()
counts = dict()
for line in fhand:
words = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
print counts
</font></pre><font color="black">When we run the program, we see a raw dump of all of the counts in unsorted
hash order.
(the <tt>romeo.txt</tt> file is available at
<tt>www.py4inf.com/code/romeo.txt</tt>)
</font><pre><font size="4" color="blue">
python count1.py
Enter the file name: romeo.txt
{'and': 3, 'envious': 1, 'already': 1, 'fair': 1,
'is': 3, 'through': 1, 'pale': 1, 'yonder': 1,
'what': 1, 'sun': 2, 'Who': 1, 'But': 1, 'moon': 1,
'window': 1, 'sick': 1, 'east': 1, 'breaks': 1,
'grief': 1, 'with': 1, 'light': 1, 'It': 1, 'Arise': 1,
'kill': 1, 'the': 3, 'soft': 1, 'Juliet': 1}
</font></pre><font color="black">It is a bit inconvenient to look through the dictionary to find the
most common words and their counts, so we need to add some more
Python code to get us the output that will be more helpful.</font><br />
<br />
<a name="toc109"></a>
<h2><font color="black"><a name="htoc119">9.3</a> Looping and dictionaries</font></h2>
<a name="@default618"></a>
<a name="@default619"></a>
<a name="@default620"></a>
<font color="black">If you use a dictionary as the sequence
in a <tt>for</tt> statement, it traverses
the keys of the dictionary. This loop
prints each key and the corresponding value:
</font><pre><font size="4" color="blue">
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
print key, counts[key]
</font></pre><font color="black">Here's what the output looks like:
</font><pre><font size="4" color="blue">
jan 100
chuck 1
annie 42
</font></pre><font color="black">Again, the keys are in no particular order.<br />
<br />
<a name="@default621"></a>
We can use this pattern to implement the various loop idioms
that we have described earlier. For example if we wanted
to find all the entries in a dictionary with a value
above ten, we could write the following code:
</font><pre><font size="4" color="blue">
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
if counts[key] > 10 :
print key, counts[key]
</font></pre><font color="black">The <tt>for</tt> loop iterates through the
<em>keys</em> of the dictionary, so we must
use the index operator to retrieve the
corresponding <em>value</em>
for each key.
Here's what the output looks like:
</font><pre><font size="4" color="blue">
jan 100
annie 42
</font></pre><font color="black">We see only the entries with a value above 10.<br />
<br />
<a name="@default622"></a>
<a name="@default623"></a>
If you want to print the keys in alphabetical order, you first
make a list of the keys in the dictionary using the
<tt>keys</tt> method available in dictionary objects,
and then sort that list
and loop through the sorted list, looking up each
key printing out key/value pairs in sorted order as follows
as follows:
</font><pre><font size="4" color="blue">
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = counts.keys()
print lst
lst.sort()
for key in lst:
print key, counts[key]
</font></pre><font color="black">Here's what the output looks like:
</font><pre><font size="4" color="blue">
['jan', 'chuck', 'annie']
annie 42
chuck 1
jan 100
</font></pre><font color="black">First you see the list of keys in unsorted order that
we get from the <tt>keys</tt> method. Then we see the key/value
pairs in order from the <tt>for</tt> loop.</font><br />
<br />
<a name="toc110"></a>
<h2><font color="black"><a name="htoc120">9.4</a> Advanced text parsing</font></h2>
<a name="@default624"></a><font color="black">
In the above example using the file <tt>romeo.txt</tt>,
we made the file as simple as possible by removing
any and all punctuation by hand. The real text
has lots of punctuation as shown below:
</font><pre><font size="4" color="blue">
But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,
</font></pre><font color="black">Since the Python <tt>split</tt> function looks for spaces and
treats words as tokens separated by spaces, we would treat the
words "soft!" and "soft" as <em>different</em> words and create
a separate dictionary entry for each word.<br />
<br />
Also since the file has capitalization, we would treat
"who" and "Who" as different words with different
counts.<br />
<br />
We can solve both these problems by using the string
methods <tt>lower</tt>, <tt>punctuation</tt>, and <tt>translate</tt>. The
<tt>translate</tt> is the most subtle of the methods.
Here is the documentation for <tt>translate</tt>:<br />
<br />
<code>string.translate(s, table[, deletechars])</code><br />
<br />
<em>Delete all characters from s that are in deletechars (if present),
and then translate the characters using table, which must
be a 256-character string giving the translation for each
character value, indexed by its ordinal. If table is None,
then only the character deletion step is performed.</em><br />
<br />
We will not specify the <tt>table</tt> but we will use
the <tt>deletechars</tt> parameter to delete all of the punctuation.
We will even let Python tell us the list of characters
that it considers "punctuation":
</font><pre><font size="4" color="blue">
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
</font></pre><font color="black">We make the following modifications to our program:
</font><pre><font size="4" color="blue">
import string # New Code
fname = raw_input('Enter the file name: ')
try:
fhand = open(fname)
except:
print 'File cannot be opened:', fname
exit()
counts = dict()
for line in fhand:
line = line.translate(None, string.punctuation) # New Code
line = line.lower() # New Code
words = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
print counts
</font></pre><font color="black">We use <tt>translate</tt> to remove all punctuation and <tt>lower</tt> to
force the line to lowercase. Otherwise the program is unchanged.
Note for Python 2.5 and earlier, <tt>translate</tt> does not
accept <tt>None</tt> as the first parameter so use this code for the translate
call:
</font><pre><font size="4" color="blue">
print a.translate(string.maketrans(' ',' '), string.punctuation
</font></pre><font color="black">Part of learning the "Art of Python" or "Thinking Pythonically"
is realizing that Python
often has built-in capabilities for many common data-analysis
problems. Over time, you will see enough example code and read
enough of the documentation to know where to look to see if someone
has already written something that makes your job much easier.<br />
<br />
The following is an abbreviated version of the output:
</font><pre><font size="4" color="blue">
Enter the file name: romeo-full.txt
{'swearst': 1, 'all': 6, 'afeard': 1, 'leave': 2, 'these': 2,
'kinsmen': 2, 'what': 11, 'thinkst': 1, 'love': 24, 'cloak': 1,
a': 24, 'orchard': 2, 'light': 5, 'lovers': 2, 'romeo': 40,
'maiden': 1, 'whiteupturned': 1, 'juliet': 32, 'gentleman': 1,
'it': 22, 'leans': 1, 'canst': 1, 'having': 1, ...}
</font></pre><font color="black">Looking through this output is still unwieldy and we can use
Python to gives us exactly what we are looking for, but to do
so, we need to learn about Python <b>tuples</b>. We
will pick up this example once we learn about tuples.</font><br />
<br />
<a name="toc111"></a>
<h2><font color="black"><a name="htoc121">9.5</a> Debugging</font></h2>
<a name="@default625"></a>
<font color="black">As you work with bigger datasets it can become unwieldy to
debug by printing and checking data by hand. Here are some
suggestions for debugging large datasets:
</font><dl compact="compact"><dt><font color="black"><b>Scale down the input:</b></font></dt><dd><font color="black"> If possible, reduce the size of the
dataset. For example if the program reads a text file, start with
just the first 10 lines, or with the smallest example you can find.
You can either edit the files themselves, or (better) modify the
program so it reads only the first <tt>n</tt> lines.<br />
<br />
If there is an error, you can reduce <tt>n</tt> to the smallest
value that manifests the error, and then increase it gradually
as you find and correct errors.</font><br />
<br />
</dd><dt><font color="black"><b>Check summaries and types:</b></font></dt><dd><font color="black"> Instead of printing and checking the
entire dataset, consider printing summaries of the data: for example,
the number of items in a dictionary or the total of a list of numbers.<br />
<br />
A common cause of runtime errors is a value that is not the right
type. For debugging this kind of error, it is often enough to print
the type of a value.</font><br />
<br />
</dd><dt><font color="black"><b>Write self-checks:</b></font></dt><dd><font color="black"> Sometimes you can write code to check
for errors automatically. For example, if you are computing the
average of a list of numbers, you could check that the result is
not greater than the largest element in the list or less than
the smallest. This is called a "sanity check" because it detects
results that are "completely illogical."<br />
<br />
<a name="@default626"></a>
<a name="@default627"></a><br />
<br />
Another kind of check compares the results of two different
computations to see if they are consistent. This is called a
"consistency check."</font><br />
<br />
</dd><dt><font color="black"><b>Pretty print the output:</b></font></dt><dd><font color="black"> Formatting debugging output
can make it easier to spot an error. </font></dd></dl>
<font color="black">Again, time you spend building scaffolding can reduce
the time you spend debugging.</font><br />
<br />
<a name="@default628"></a><br />
<br />
<a name="toc112"></a>
<h2><font color="black"><a name="htoc122">9.6</a> Glossary</font></h2>
<dl compact="compact"><dt><font color="black"><b>dictionary:</b></font></dt><dd><font color="black"> A mapping from a set of keys to their
corresponding values.
</font><a name="@default629"></a><br />
<br />
</dd><dt><font color="black"><b>hashtable:</b></font></dt><dd><font color="black"> The algorithm used to implement Python
dictionaries.
</font><a name="@default630"></a><br />
<br />
</dd><dt><font color="black"><b>hash function:</b></font></dt><dd><font color="black"> A function used by a hashtable to compute the
location for a key.
</font><a name="@default631"></a><br />
<br />
</dd><dt><font color="black"><b>histogram:</b></font></dt><dd><font color="black"> A set of counters.
</font><a name="@default632"></a><br />
<br />
</dd><dt><font color="black"><b>implementation:</b></font></dt><dd><font color="black"> A way of performing a computation.
</font><a name="@default633"></a><br />
<br />
</dd><dt><font color="black"><b>item:</b></font></dt><dd><font color="black"> Another name for a key-value pair.
</font><a name="@default634"></a><br />
<br />
</dd><dt><font color="black"><b>key:</b></font></dt><dd><font color="black"> An object that appears in a dictionary as the
first part of a key-value pair.
</font><a name="@default635"></a><br />
<br />
</dd><dt><font color="black"><b>key-value pair:</b></font></dt><dd><font color="black"> The representation of the mapping from
a key to a value.
</font><a name="@default636"></a><br />
<br />
</dd><dt><font color="black"><b>lookup:</b></font></dt><dd><font color="black"> A dictionary operation that takes a key and finds
the corresponding value.
</font><a name="@default637"></a><br />
<br />
</dd><dt><font color="black"><b>nested loops:</b></font></dt><dd><font color="black"> When there is one or more loops "inside" of
another loop. The inner loop runs to completion each time the outer
loop runs once.
</font><a name="@default638"></a>
<a name="@default639"></a><br />
<br />
</dd><dt><font color="black"><b>value:</b></font></dt><dd><font color="black"> An object that appears in a dictionary as the
second part of a key-value pair. This is more specific than
our previous use of the word "value."
</font><a name="@default640"></a></dd></dl>
<a name="toc113"></a>
<h2><font color="black"><a name="htoc123">9.7</a> Exercises</font></h2><br />
<div align="left"><font color="black"><b>Exercise 2</b> <em>
Write a program that categorizes each mail message by which
day of the week the commit was done. To do this look for
lines which start with "From", then look for the
third word and then keep a running count of each of the
days of the week. At the end of the program print out the
contents of your dictionary (order does not matter).
</em></font><pre><font size="4" color="blue"><em>
Sample Line:
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Sample Execution:
python dow.py
Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}
</em></font></pre></div><br />
<div align="left"><font color="black"><b>Exercise 3</b> <em>
Write a program to read through a mail log, and
build a histogram using a dictionary to count how many
messages have come from each email address
and print the dictionary.
</em></font><pre><font size="4" color="blue"><em>
Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}
</em></font></pre></div><br />
<div align="left"><font color="black"><b>Exercise 4</b> <em>
Add code to the above program to figure out who has the
most messages in the file.<br />
<br />
After all the data has been read and the dictionary has been
created, look through the dictionary using a
maximum loop
(see Section <a href="cfbook006.html#maximumloop">??</a>)
to find who has the most
messages and print how many messages the person has.
</em></font><pre><font size="4" color="blue"><em>
Enter a file name: mbox-short.txt
cwen@iupui.edu 5
Enter a file name: mbox.txt
zqian@umich.edu 195
</em></font></pre></div><br />
<div align="left"><font color="black"><b>Exercise 5</b> <em>
This program records the domain name (instead of the address)
where the message was sent from instead of who the mail
came from (i.e. the whole e-mail address). At the end
of the program print out the contents of your dictionary.
</em></font><pre><font size="4" color="blue"><em>
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}
</em></font></pre></div><br />
<hr />
<a href="cfbook009.html"><img src="previous_motif.gif" alt="Previous" /></a>
<a href="index.html"><img src="contents_motif.gif" alt="Up" /></a>
<a href="cfbook011.html"><img src="next_motif.gif" alt="Next" /></a>
</body>
</html>