{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Basic Programming Using Python: Files and Lists" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- FIXME" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Strings From Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our [previous lesson](python-3-conditionals-defensive.ipynb),\n", "we saw how to set the colors in a grid based on the characters in a string.\n", "The time has come to see how to get those strings (and other things) out of files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a refresher,\n", "here's our coloring function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def color_from_string(grid, data):\n", " \"Color grid cells red and green according to 'R' and 'G' in data.\"\n", " assert grid.width == len(data), \\\n", " 'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))\n", " for x in range(grid.width):\n", " assert data[x] in 'GR', \\\n", " 'Unknown character in data string: \"{0}\"'.format(data[x])\n", " if data[x] == 'R':\n", " grid[x, 0] = colors['Red']\n", " else:\n", " grid[x, 0] = colors['Green']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's how we use it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from ipythonblocks import ImageGrid, colors\n", "\n", "row = ImageGrid(5, 1)\n", "color_from_string(row, 'RRGRR')\n", "row.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a conventional text editor,\n", "we can create a text file that contains just that string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_rrgrr.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRGRR\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's read it into our program:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "reader = open('grid_rrgrr.txt', 'r')\n", "line = reader.readline()\n", "reader.close()\n", "print 'line is:', line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "line is: RRGRR\n", "\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Line 1 uses a built-in function called `open` to open our file.\n", "`open`'s first parameter specifies the file we want to open;\n", "the second parameter,\n", "`'r'`,\n", "signals that we want to read the file.\n", "(We can use `'w'` to write files,\n", "which we'll explore later.)\n", "`open` returns a special object that keeps track of which file we opened,\n", "and how much of its data we've read.\n", "This object is sometimes called a [file handle](glossary.html#file_handle),\n", "and we can assign it to a variable like any other value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second line of our program asks the file handle's `readline` method\n", "to read the first line from the file\n", "and give it back to us as a string.\n", "Once we've done that,\n", "we ask the file handle to close itself\n", "(i.e., to disconnect from the file),\n", "and then we print the string we read.\n", "The result is `'RRGRR'`,\n", "just as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or is it?\n", "Let's take a look at `line`'s length:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print len(line)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "6\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why does `len` tell us there are six characters instead of five?\n", "We can use another function called `repr` to take a closer look\n", "at what we actually read:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print repr(line)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "'RRGRR\\n'\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`repr` stands for \"representation\".\n", "It returns whatever we'd have to type into a Python program\n", "to create the thing we've given it as a parameter.\n", "In this case,\n", "it's telling us that our string contains 'R', 'R', 'G', 'R', 'R', and '\\n'.\n", "That last thing is called an [escape sequence](glossary.html#escape_sequence),\n", "and it's how Python represent a [newline character](glossary.html#newline_character)\n", "in a string.\n", "We can use other escape sequences to represent other special characters:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'We\\'ll put a single quote in a single-quoted string.'\n", "print \"Or we\\\"ll put a double quote in a double-quoted string.\"\n", "print 'This\\nstring\\ncontains\\nnewlines.'\n", "print 'And\\tthis\\tone\\tcontains\\ttabs.'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "We'll put a single quote in a single-quoted string.\n", "Or we\"ll put a double quote in a double-quoted string.\n", "This\n", "string\n", "contains\n", "newlines.\n", "And\tthis\tone\tcontains\ttabs.\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *Carriage Return, Newline, and All That*\n", "\n", "\n", "If we create our file on Windows,\n", "it might contain 'RRGRR\\r\\n' instead of 'RRGRR\\n'.\n", "The '\\r' is a [carriage return](glossary.html#carriage_return),\n", "and it's there because Windows uses two characters to mark the ends of lines\n", "rather than just one.\n", "There's no reason to prefer one convention over the other,\n", "but problems do arise when we create files one way\n", "and try to read them with programs that expect the other.\n", "Python does its best to shield us from this\n", "by converting Windows-style '\\r\\n' end-of-line markers to '\\n'\n", "as it reads data from files.\n", "If we really want to keep the original line endings,\n", "we need to use `'rb'` (for \"read binary\") when we open the file\n", "instead of just `'r'`.\n", "For more on this and other madness,\n", "see Joel Spolsky's article\n", "[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The easiest way to get rid of our annoying newline character\n", "is to use `str.strip`,\n", "i.e.,\n", "the `strip` method of the string data type.\n", "As its interactive help says:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "help(str.strip)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on method_descriptor:\n", "\n", "strip(...)\n", " S.strip([chars]) -> string or unicode\n", " \n", " Return a copy of the string S with leading and trailing\n", " whitespace removed.\n", " If chars is given and not None, remove characters in chars instead.\n", " If chars is unicode, S will be converted to unicode before stripping\n", "\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`str.strip` creates a new string by removing any leading or trailing [whitespace](glossary.html#whitespace) characters\n", "from the original.\n", "Whitespace includes carriage return,\n", "newline,\n", "tab,\n", "and the familiar space character,\n", "so stripping the string also takes care of any accidental indentation\n", "or (invisible) trailing spaces:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "original = ' indented with trailing spaces '\n", "stripped = original.strip()\n", "print '|{0}|'.format(stripped)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "|indented with trailing spaces|\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use this to fix our string and initialize our grid.\n", "In fact,\n", "let's write a function that takes a grid and a filename as parameters\n", "and fills the grid using the color specification in that file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def color_from_file(grid, filename):\n", " 'Color the cells in a grid using a spec stored in a file.'\n", " reader = open(filename, 'r')\n", " line = reader.readline()\n", " reader.close()\n", " line = line.strip()\n", " color_from_string(grid, line)\n", "\n", "another_row = ImageGrid(5, 1)\n", "color_from_file(another_row, 'grid_rrgrr.txt')\n", "another_row.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's progress,\n", "but we can do better.\n", "When we were creating grids *and* color strings in the same program,\n", "it was fairly easy to make sure the grid and the string were the same size.\n", "Opening a text file in an editor and\n", "counting the characters on the first line\n", "will be a lot more painful,\n", "so why don't we create the grid\n", "based on how long the string is?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def create_from_file(filename):\n", " 'Create and color a grid using a spec stored in a file.'\n", " reader = open(filename, 'r')\n", " line = reader.readline()\n", " reader.close()\n", " line = line.strip()\n", " grid = ImageGrid(len(line), 1)\n", " color_from_string(grid, line)\n", " return grid\n", "\n", "newly_made = create_from_file('grid_rrgrr.txt')\n", "newly_made.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is starting to look like our friend `skimage.novice.open`:\n", "given a filename,\n", "it loads the data from that file into a suitable object in memory\n", "and gives the object back to us for further use.\n", "What's more,\n", "it does that using a function that initializes objects which are already in memory,\n", "so that we can fill things several times in exactly the same way\n", "without any duplicated code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing in this function that requires a bit of explanation is\n", "the method call `reader.close()`.\n", "When we open a file,\n", "the operating system creates a connection between our program and that file.\n", "For performance and security reasons,\n", "it will only let a single program have a fixed number of files open at any one time,\n", "and will only allow a single file to be opened by a fixed number of programs at once.\n", "Both limits are typically up in the thousands,\n", "and the operating system automatically closes open files\n", "when a program finishes running,\n", "so we're unlikely to run into problems most of the time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But that's precisely what makes this problematic.\n", "Something that only goes wrong when we're doing something large\n", "is much harder to debug than something that also goes wrong in the small.\n", "It's therefore a very good idea to get into the habit of closing files\n", "as soon as they're no longer needed.\n", "In fact,\n", "it's such a good idea that Python and other languages\n", "have a way to guarantee that it happens automatically:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def create_from_file(filename):\n", " 'Create and color a grid using a spec stored in a file.'\n", " with open(filename, 'r') as reader:\n", " line = reader.readline()\n", " line = line.strip()\n", " grid = ImageGrid(len(line), 1)\n", " color_from_string(grid, line)\n", " return grid\n", "\n", "another_row = create_from_file('grid_rrgrr.txt')\n", "another_row.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `with...as...` statement takes whatever is created by its first part—in\n", "our case,\n", "the result of opening a file—and\n", "assigns it to the variable given in its second part.\n", "It then executes a block of code,\n", "and when that block is finished,\n", "it cleans up the stored value.\n", "\"Cleaning up\" a file means closing it;\n", "it means different things for databases and connections to hardware devices,\n", "but in every case,\n", "Python guarantees to do the right thing at the right time.\n", "We'll use `with` statements for file I/O from now on." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A single row of pixels is a lot less interesting than an actual image,\n", "but before we can read the latter,\n", "we need to learn how to use [lists](glossary.html#list).\n", "Just as a `for` loop is a way to do operations many times,\n", "a list is a way to store many values in one variable.\n", "To start our exploration of lists,\n", "try this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "odds = [1, 3, 5]\n", "for number in odds:\n", " print number" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n", "3\n", "5\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`[1, 3, 5]` is a list.\n", "Its elements are written in square brackets and separated by commas,\n", "and just as a `for` loop over a string works on those characters one at a time,\n", "a `for` loop over a list processes the list's values one by one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do something a bit more useful with a list of numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3, 3, 4, 3, 4, 1]\n", "total = 0.0\n", "for n in data:\n", " total += n\n", "mean = total / len(data)\n", "print 'mean is', mean" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "mean is 2.77777777778\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "By now,\n", "the logic here should be fairly easy to follow.\n", "`data` refers to our list,\n", "and `total` is initialized to 0.0.\n", "Each iteration of the loop adds the next number from the list to `total`,\n", "and when we're done,\n", "we divide the result by the list's length to get the mean.\n", "(Note that we initialize `total` to 0.0 rather than 0,\n", "so that it is always a floating-point number.\n", "If we didn't do this,\n", "its final value might be an integer,\n", "and the division could give us a truncated approximation to the actual mean.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *A Simpler Way*\n", "\n", "\n", "Python actually has a build-in function called `sum` that does what our loop does,\n", "so we can calculate the mean more simply using this:\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'mean is', float(sum(data)) / len(data)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " mean is 2.77777777778\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Again,\n", "it's important to understand that `float( sum(data)/len(data) )` might not return the right answer,\n", "since it would do integer/integer division (producing a possibly-truncated result)\n", "and then convert that value to a float.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are probably used more than any other data structure in programming,\n", "so let's have a closer look at them.\n", "First, lists are [mutable](glossary.html#mutable),\n", "i.e.,\n", "they can be changed after they are created:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "values[0] = 'one'\n", "values[1] = 'three'\n", "values[2] = 'five'\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['one', 'three', 'five']\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the diagram below shows,\n", "this works because the list doesn't actually contain any values.\n", "Instead,\n", "it stores [references](glossary.html#reference) to values.\n", "When we assign something to `values[0]`,\n", "what we're really doing is putting a different reference in that location in the list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "FIXME: diagram" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a program that does something a bit more useful:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3]\n", "result = []\n", "current = 0\n", "for n in data:\n", " current = current + n\n", " result.append(current)\n", "print 'running total:', result" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "running total: [1, 5, 7, 10]\n" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`result` starts off as an [empty list](glossary.html#empty-list),\n", "and `current` starts off as zero.\n", "Each iteration of the loop\n", "adds the next value in the list `data` to `current` to calculate the running total.\n", "It then appends this value to `result`,\n", "so that when the program finishes we have a complete list of partial sums." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we want to double the values in `data` in place?\n", "We could try this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3]\n", "for n in data:\n", " n = 2 * n\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [1, 4, 2, 3]\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "but as we can see,\n", "it doesn't work.\n", "When Python calculates `2*n`\n", "it creates a new value in memory.\n", "It then makes the variable `n` point at the value for a few microseconds\n", "before going around the loop again\n", "and pointing `n` at the next value from the list instead.\n", "Since nothing is pointing to the temporary value we just created any longer,\n", "Python throws it away." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The right way to solve this problem is to use indexing and the `range` function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3]\n", "for i in range(4):\n", " data[i] = 2 * data[i]\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [2, 8, 4, 6]\n" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again we have violated the DRY Principle by using `range(4)`:\n", "if we ever change the number of values in `data`,\n", "our loop will either fail because we're trying to index beyond its end,\n", "or what's worse,\n", "appear to succeed but not actually update some values.\n", "Let's fix that:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3] # re-initialize our sample data\n", "for i in range(len(data)):\n", " data[i] *= 2\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [2, 8, 4, 6]\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's better:\n", "`len(data)` is always the actual length of the list,\n", "so `range(len(data))` is always the indices we need.\n", "We've also rewritten the multiplication and assignment to use an in-place operator `*=`\n", "so that we aren't repeating `data[i]`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do a lot of other interesting things with lists,\n", "like concatenate them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "left = [1, 2, 3]\n", "right = [4, 5, 6]\n", "combined = left + right\n", "print combined" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 2, 3, 4, 5, 6]\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "count how many times a particular value appears in them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = ['a', 'c', 'g', 'g', 'c', 't', 'a', 'c', 'g', 'g']\n", "print data.count('g')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "4\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "sort them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.sort()\n", "print data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['a', 'a', 'c', 'c', 'c', 'g', 'g', 'g', 'g', 't']\n" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and reverse them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.reverse()\n", "print data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['t', 'g', 'g', 'g', 'g', 'c', 'c', 'c', 'a', 'a']\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *A Health Warning*\n", "\n", "\n", "One thing that newcomers (and even experienced programmers) often trip over is that\n", "`sort` and `reverse` mutate the list,\n", "i.e.,\n", "they rearrange values within a single list\n", "rather than creating and returning a new list.\n", "If we do this:\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sorted_data = data.sort()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "then all we have is the special value `None`,\n", "which Python uses to mean \"there's nothing here\":\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print sorted_data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "None\n" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "At some point or another,\n", "everyone types `data = data.sort()` and then wonders where their time series has gone…\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we know how to create lists,\n", "we're ready to load two-dimensional images from files.\n", "Here's our first test file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_3x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\r\n", "RGR\r\n", "GRR\r\n" ] } ], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and here's how we read it line by line with Python:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " for line in source:\n", " print line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\n", "\n", "RGR\n", "\n", "GRR\n", "\n" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whoops: we forgot to strip the newlines off the ends of the lines\n", "as we read them from the file.\n", "Let's fix that:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " for line in source:\n", " print line.strip()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\n", "RGR\n", "GRR\n" ] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's better.\n", "As this example shows,\n", "a `for` loop over a file reads the lines from the file one by one\n", "and assigns each to the loop variable in turn.\n", "If we want to get all the lines at once,\n", "we can do this instead:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " lines = source.readlines() # with an 's' on the end\n", "print lines" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['RRG\\n', 'RGR\\n', 'GRR\\n']\n" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`file.readlines` (with an 's' on the end to distinguish it from `file.readline`)\n", "reads the entire file at once\n", "and returns a list of strings,\n", "one per line.\n", "The length of this list tells us how many rows we need in our grid,\n", "while the length of the first line (minus the newline character)\n", "tells us how many columns we need:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " lines = source.readlines()\n", "height = len(lines)\n", "width = len(lines[0]) - 1\n", "print '{0}x{1} grid'.format(width, height)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3x3 grid\n" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upon reflection,\n", "that's not actually a very good test case,\n", "since we can't actually tell if we have `height` and `width` the right way around.\n", "Let's use a rectangular data file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_5x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGRR\r\n", "RGRRR\r\n" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and put our code in a function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_size(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " return len(lines[0]) - 1, len(lines)\n", "\n", "width, height = read_size('grid_5x3.txt')\n", "print '{0}x{1} grid'.format(width, height)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5x3 grid\n" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As this example shows,\n", "a function can return several values at once.\n", "When it does,\n", "those values are matched against the caller's variables from left to right.\n", "This can actually be done anywhere:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "red, green, blue = 255, 0, 128\n", "print 'red={0} green={1} blue={2}'.format(red, green, blue)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "red=255 green=0 blue=128\n" ] } ], "prompt_number": 46 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and gives us an easy way to swap the values of two variables:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "low, high = 25, 10 # whoops\n", "low, high = high, low # exchange their values\n", "print 'low={0} high={1}'.format(low, high)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "low=10 high=25\n" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Back to our function…\n", "Rather than just returning sizes,\n", "it would be more useful for us to create and fill in a grid.\n", "As we're doing this,\n", "though,\n", "we must remember to strip the newlines off the strings we have read from the file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " for y in range(len(lines)):\n", " fill_grid_line(result, y, lines[y].strip())\n", " return result" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the most complicated function we've written so far,\n", "so let's go through it step by step:\n", "\n", "1. Define `read_grid` to take a single parameter.\n", "2. Open the file named by that parameter and assign the file handle to `source`.\n", "3. Read all of the lines from the file at once and assign the resulting list to `lines`.\n", "4. Having closed the file, calculate the width and height of the grid.\n", "5. Create the grid.\n", "6. Loop over the lines.\n", "7. Fill in a single line of the grid using an as-yet-unwritten function called `fill_grid_line`.\n", "8. Once the loop is done, return the resulting grid.\n", "\n", "We need a new function `fill_grid_line`\n", "because the function we've been using,\n", "`color_from_string`,\n", "always colors row 0 of whatever grid it's given.\n", "We need something that can color any row we specify:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def fill_grid_line(grid, y, data):\n", " \"Color grid cells in row y red and green according to 'R' and 'G' in data.\"\n", " assert 0 <= y < grid.height, \\\n", " 'Row index {0} not within grid height {1}'.format(y, grid.height)\n", " assert grid.width == len(data), \\\n", " 'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))\n", " for x in range(grid.width):\n", " assert data[x] in 'GR', \\\n", " 'Unknown character in data string: \"{0}\"'.format(data[x])\n", " if data[x] == 'R':\n", " grid[x, y] = colors['Red']\n", " else:\n", " grid[x, y] = colors['Green']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as adding an extra parameter `y` to this function,\n", "we've added an extra assertion to make sure it's between 0 and the grid's height.\n", "In fact,\n", "we could have said,\n", "\"*Since* we're adding an extra parameter,\n", "we've added an extra assertion,\"\n", "since it's good practice to check every input to a function before using it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give our functions a try:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rectangle = read_grid('grid_5x3.txt')\n", "rectangle.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perfect—or is it?\n", "Take another look at our data file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_5x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGRR\r\n", "RGRRR\r\n" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 'G' in the top row of the data file is on the right,\n", "but the green square in the top row of the data file is on the left.\n", "The green cell in the bottom row of the grid\n", "is also in the wrong place.\n", "Somehow,\n", "our grid appears to be upside down." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The problem is that we haven't used a consistent coordinate system.\n", "`ImageGrid` uses a Cartesian grid with the origin in the lower left and Y going upward,\n", "but we're treating the file as if the origin was at the top,\n", "just as it is in a spreadsheet.\n", "The simplest way to fix this is to reverse our list of lines before using it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " lines.reverse() # align with ImageGrid coordinate system\n", " for y in range(len(lines)):\n", " fill_grid_line(result, y, lines[y].strip())\n", " return result\n", "\n", "rectangle = read_grid('grid_5x3.txt')\n", "rectangle.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "All that's left is to make sure that all the lines are the same length\n", "so that we're warned of an error if we try to use a file like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_ragged.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGR\r\n", "RGR\r\n" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we require all the lines to be the same length,\n", "we can compare their lengths against the length of any one line.\n", "We can do this in a loop of its own:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "for line in lines:\n", " assert len(line) == width\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or put the test in the loop that's filling the lines:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "for y in range(len(lines)):\n", " assert len(lines[y].strip()) == width\n", " fill_grid_line(result, y, lines[y].strip())\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first does the checks before it makes any changes to the grid.\n", "Since we're creating the grid inside the function,\n", "though,\n", "this isn't a real worry:\n", "if there's an error in the file,\n", "our assertion will cause the function to fail\n", "and the partially-initialized grid will never be returned to the caller.\n", "We will therefore use the second form,\n", "but modify it slightly so that we only call `strip` once (DRY again):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " \"Initialize a grid by reading lines of 'R' and 'G' from a file.\"\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " lines.reverse()\n", " for y in range(len(lines)):\n", " string = lines[y].strip()\n", " assert len(string) == width, \\\n", " 'Line {0} is {1} long, not {2}'.format(y, len(string), width)\n", " fill_grid_line(result, y, string)\n", " return result" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 72 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As always,\n", "we're not done until we test our change:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "read_grid('grid_ragged.txt')" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "Line 0 is 3 long, not 5", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mread_grid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'grid_ragged.txt'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mread_grid\u001b[0;34m(filename)\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0my\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlines\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mstring\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlines\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mwidth\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Line {0} is {1} long, not {2}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwidth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0mfill_grid_line\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: Line 0 is 3 long, not 5" ] } ], "prompt_number": 74 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And of course we should make sure that it still works for a valid file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "once_more = read_grid('grid_5x3.txt')\n", "once_more.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 75 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Thumbnails Revisited" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all the concepts we need to create thumbnails for a set of images,\n", "and almost all the tools.\n", "The one remaining piece of the puzzle is the unpleasantly-named `glob`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import glob\n", "print 'text files:', glob.glob('*.txt')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "text files: ['grid_3x3.txt', 'grid_5x3.txt', 'grid_ragged.txt', 'grid_rrgrr.txt']\n" ] } ], "prompt_number": 80 }, { "cell_type": "code", "collapsed": false, "input": [ "print 'IPython Notebooks:', glob.glob('*.ipynb')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "IPython Notebooks: ['python-0-resize-image.ipynb', 'python-1-functions.ipynb', 'python-2-loops-indexing.ipynb', 'python-3-conditionals-defensive.ipynb', 'python-4-files-lists.ipynb']\n" ] } ], "prompt_number": 81 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"glob\" was originally short for \"global command\",\n", "but it has long since become a verb in its own right.\n", "It takes a single string as a parameter\n", "and uses it to do [wildcard](glossary.html#wildcard) matching on filenames,\n", "returning a list of matches as a result.\n", "Once we have this list,\n", "we can loop over it and create thumbnails one by one:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from skimage import novice\n", "from glob import glob\n", "\n", "DEFAULT_WIDTH = 100\n", "\n", "def make_all_thumbnails(pattern, width=DEFAULT_WIDTH):\n", " \"Create thumbnails for all image files matching the given pattern.\"\n", " for filename in glob(pattern):\n", " make_thumbnail(filename, width)\n", "\n", "def make_thumbnail(original_filename, width=DEFAULT_WIDTH):\n", " \"Create a thumbnail for a single image file.\"\n", " picture = novice.open(original_filename)\n", " new_height = int(picture.height * float(width) / picture.width)\n", " picture.size = (width, new_height)\n", " thumbnail_filename = 'thumbnail-' + original_filename\n", " picture.save(thumbnail_filename)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 83 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only thing that's really new here is the way we specify the default value for thumbnail widths.\n", "Since people might call both `make_all_thumbnails` and `make_thumbnail` directly,\n", "we want to be able to set the width for either.\n", "However,\n", "we also want their default values to be the same,\n", "so we define that value once near the top of the program\n", "and use it in both function definitions.\n", "By convention,\n", "\"constant\" values like `DEFAULT_WIDTH` are spelled in UPPER CASE\n", "to indicate that they shouldn't be changed." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Key Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- FIXME" ] } ], "metadata": {} } ] }