pylearn: doc/v2_planning/API_coding

comparison doc/v2_planning/API_coding_style.txt @ 1331:0541e7d6e916

merge

author	gdesjardins
date	Thu, 14 Oct 2010 23:55:55 -0400
parents	4efa2630f430
children	90116fb3636b

comparison

equal deleted inserted replaced

-:3efd0effb2a7
+:0541e7d6e916
 .. code-block:: python
 """Module docstring as the first line, as usual."""
-__authors__ = "Olivier Delalleau, Frederic Bastien, David Warde-Farley"
+__authors__   = "Olivier Delalleau, Frederic Bastien, David Warde-Farley"
 __copyright__ = "(c) 2010, Universite de Montreal"
-__license__ = "3-clause BSD License"
+__license__   = "3-clause BSD License"
-__contact__ = "Name Of Current Guardian of this file <email@address>"
+__contact__   = "Name Of Current Guardian of this file <email@address>"
 * Use ``//`` for integer division and ``/ float(...)`` if you want the
 floating point operation (for readability and compatibility across all
 versions of Python).
 if efficiency is typically not an issue here, the main goal being code
 consistency). Also, always use ``numpy.isinf`` / ``numpy.isnan`` to
 test infinite / NaN values. This is important because ``numpy.nan !=
 float('nan')``.
+* Whenever possible, mimic the numpy / scipy interfaces when writing code
+similar to what can be found in these packages.
 * Avoid backslashes whenever possible. They make it more
 difficult to edit code, and they are ugly (as well as potentially
 dangerous if there are trailing white spaces).
 .. code-block:: python
 * When indenting multi-line statements like lists or function arguments,
 keep elements of the same level aligned with each other.
 The position of the first
 element (on the same line or a new line) should be chosen depending on
 what is easiest to read (sometimes both can be ok).
+Other formattings may be ok depending on the specific situation, use
+common sense and pick whichever looks best.
 .. code-block:: python
 # Good.
 for my_very_long_variable_name in [my_foo, my_bar, my_love,
 Code Sample
 ===========
 The following code sample illustrates some of the coding guidelines one should
-follow in Pylearn. This is still a work-in-progress.
+follow in Pylearn. This is still a work-in-progress. Feel free to improve it and
+add more!
 .. code-block:: python
 #! /usr/env/bin python
-"""Sample code. There may still be mistakes / missing elements."""
+"""Sample code. Edit it as you like!"""
-__authors__ = "Olivier Delalleau"
+__authors__   = "Olivier Delalleau"
 __copyright__ = "(c) 2010, Universite de Montreal"
-__license__ = "3-clause BSD License"
+__license__   = "3-clause BSD License"
-__contact__ = "Olivier Delalleau <delallea@iro>"
+__contact__   = "Olivier Delalleau <delallea@iro>"
 # Standard library imports are on a single line.
 import os, sys, time
 # Third-party imports come after standard library imports, and there is
 # only one import per line. Imports are sorted lexicographically.
 import numpy
 import scipy
 import theano
-# Put 'from' imports below.
+# Individual 'from' imports come after packages.
 from numpy import argmax
 from theano import tensor
 # Application-specific imports come last.
-from pylearn import dataset
+# The absolute path should always be used.
-from pylearn.optimization import minimize
+from pylearn import datasets, learner
+from pylearn.formulas import noise
-def print_files_in(directory):
-"""Print the first line of each file in given directory."""
-# TODO To be continued...
+# All exceptions inherit from Exception.
+class PylearnError(Exception):
+# TODO Write doc.
+pass
+# All top-level classes inherit from object.
+class StorageExample(object):
+# TODO Write doc.
+pass
+# Two blank lines between definitions of top-level classes and functions.
+class AwesomeLearner(learner.Learner):
+# TODO Write doc.
+def __init__(self, print_fields=None):
+# TODO Write doc.
+# print_fields is a list of strings whose counts found in the
+# training set should be printed at the end of training. If None,
+# then nothing is printed.
+# Do not forget to call the parent class constructor.
+super(AwesomeLearner, self).__init__()
+# Use None instead of an empty list as default argument to
+# print_fields to avoid issues with mutable default arguments.
+self.print_fields = if_none(print_fields, [])
+# One blank line between method definitions.
+def add_field(self, field):
+# TODO Write doc.
+# Test if something belongs to a container with `in`, not
+# container-specific methods like `index`.
+if field in self.print_fields:
+# TODO Print a warning and do nothing.
+pass
+else:
+# This is why using [] as default to print_fields in the
+# constructor would have been a bad idea.
+self.print_fields.append(field)
+def train(self, dataset):
+# TODO Write doc (store the mean of each field in the training
+# set).
+self.mean_fields = {}
+count = {}
+for sample_dict in dataset:
+# Whenever it is enough for what you need, use iterative
+# instead of list versions of dictionary methods.
+for field, value in sample_dict.iteritems():
+# Keep line length to max 80 characters, using parentheses
+# instead of \ to continue long lines.
+self.mean_fields[field] = (self.mean_fields.get(field, 0) +
+value)
+count[field] = count.get(field, 0) + 1
+for field in self.mean_fields:
+self.mean_fields[field] /= float(count[field])
+for field in self.print_fields:
+# Test is done with `in`, not `has_key`.
+if field in self.sum_fields:
+# TODO Use log module instead.
+print '%s: %s' % (field, self.sum_fields[field])
+else:
+# TODO Print warning.
+pass
+def test_error(self, dataset):
+# TODO Write doc.
+if not hasattr(self, 'sum_fields'):
+# Exceptions should be raised as follows (in particular, no
+# string exceptions!).
+raise PylearnError('Cannot test a learner that was not '
+'trained.')
+error = 0
+count = 0
+for sample_dict in dataset:
+for field, value in sample_dict.iteritems():
+try:
+# Minimize code into a try statement.
+mean = self.mean_fields[field]
+# Always specicy which kind of exception you are
+# intercepting with except.
+except KeyError:
+raise PylearnError(
+"Found in a test sample a field ('%s') that had "
+"never been seen in the training set." % field)
+error += (value - self.mean_fields[field])**2
+count += 1
+# Remember to divide by a floating point number unless you
+# explicitly want an integer division (in which case you should
+# use //).
+mse = error / float(count)
+# TODO Use log module instead.
+print 'MSE: %s' % mse
+return mse
+def if_none(val_if_not_none, val_if_none):
+# TODO Write doc.
+if val_if_not_none is not None:
+return val_if_not_none
+else:
+return val_if_none
+def print_subdirs_in(directory):
+# TODO Write doc.
+# Using list comprehension rather than filter.
+sub_dirs = sorted([d for d in os.listdir(directory)
+if os.path.isdir(os.path.join(directory, d))])
+print '%s: %s' % (directory, ' '.join(sub_dirs))
+# A `for` loop is often easier to read than a call to `map`.
+for d in sub_dirs:
+print_subdirs_in(os.path.join(directory, d))
 def main():
 if len(sys.argv) != 2:
 # Note: conventions on how to display script documentation and
-# parse arguments are still to-be-determined.
+# parse arguments are still to-be-determined. This is just one
+# way to do it.
 print("""\
 Usage: %s <directory>
-Print first line of each file in given directory (in alphabetic order)."""
+For the given directory and all sub-directories found inside it, print
+the list of the directories they contain."""
 % os.path.basename(sys.argv[0]))
 return 1
-print_files_in(sys.argv[1])
+print_subdirs_in(sys.argv[1])
 return 0
 # Top-level executable code should be minimal.
 if __name__ == '__main__':
 sys.exit(main())
 committed to Pylearn complies to above specifications. This work is not
 finalized yet, but David started a `Wiki page`_ with helpful configuration
 tips for Vim.
 .. _Wiki page: http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Divers/VimPythonRecommendations
+Commit message
+==============
+* A one line summary. Try to keep it short, and provide the information
+that seems most useful to other developers: in particular the goal of
+a change is more useful than its description (which is always
+available through the changeset patch log). E.g. say "Improved stability
+of cost computation" rather than "Replaced log(exp(a) + exp(b)) by
+a * log(1 + exp(b -a)) in cost computation".
+* If needed a blank line followed by a more detailed summary
+* Make a commit for each logical modification
+* This makes reviews easier to do
+* This makes debugging easier as we can more easily pinpoint errors in
+	  commits with hg bisect
+* NEVER commit reformatting with functionality changes
+* Review your change before commiting
+* "hg diff <files>..." to see the diff you have done
+* "hg record" allows you to select which changes to a file should be
+committed. To enable it, put into the file ~/.hgrc:
+.. code-block:: bash
+[extensions]
+hgext.record=
+* hg record / diff force you to review your code, never commit without
+running one of these two commands first
+* Write detailed commit messages in the past tense, not present tense.
+* Good: "Fixed Unicode bug in RSS API."
+* Bad: "Fixes Unicode bug in RSS API."
+* Bad: "Fixing Unicode bug in RSS API."
+* Separate bug fixes from feature changes.
+* When fixing a ticket, start the message with "Fixed #abc"
+* Can make a system to change the ticket?
+* When referencing a ticket, start the message with "Refs #abc"
+* Can make a system to put a comment to the ticket?
 TODO
 ====
 Things still missing from this document, being discussed in coding_style.txt:

Mercurial > pylearn

comparison doc/v2_planning/API_coding_style.txt @ 1331:0541e7d6e916