view doc/v2_planning/coding_style.txt @ 1077:5c14d2ffcbb3

dataset: Looked into a few more existing ML libraries
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 12:48:32 -0400
parents d422f726c156
children 56c5f0990869
line wrap: on
line source

Discussion of Coding-Style
==========================

Participants
------------
- Dumitru
- Fred
- David
- Olivier D [leader]



Fred: This is a refactored thing from James email of what we should put in message
that we send to the user:
1) Hint where in the code this log come from.
2) Hint how to hide this message? or we should this into documentation.
3) Tell explicitly if the user can ignore it and the consequence.

Existing Python coding style specifications and guidelines
----------------------------------------------------------

    * http://www.python.org/dev/peps/pep-0008/ Style Guide for Python Code
    * http://www.python.org/dev/peps/pep-0257/ Docstring Conventions 
    * http://google-styleguide.googlecode.com/svn/trunk/pyguide.html Google Python Style Guide
    * http://www.voidspace.org.uk/python/articles/python_style_guide.shtml
    * http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
    * http://www.cs.caltech.edu/courses/cs11/material/python/misc/python_style_guide.html
    * http://barry.warsaw.us/software/STYLEGUIDE.txt
    * http://self.maluke.com/style
    * http://chandlerproject.org/Projects/ChandlerCodingStyleGuidelines
    * http://lists.osafoundation.org/pipermail/dev/2003-March/000479.html
    * http://learnpython.pbworks.com/PythonTricks
    * http://eikke.com/how-not-to-write-python-code/
    * http://jaynes.colorado.edu/PythonGuidelines.html
    * http://docs.djangoproject.com/en/dev/internals/contributing/#coding-style
    * http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines 

We will probably want to take PEP-8 as starting point, and read what other
people think about it / how other coding guidelines differ from it.

Dumi: we should also try to find tools that automate these
processes: pylint, pyflakes, pychecker, pythontidy

OD: Things about PEP 8 I don't like (but it may be just me):

   * If necessary, you can add an extra pair of parentheses around an
     expression, but sometimes using a backslash looks better.
    --> I rarely find that backslash looks better. In most situations you can
        get rid of them. Typically I prefer:
            if (cond_1 and
                cond_2 and
                cond_3):
        to
            if cond_1 and \
               cond_2 and \
               cond_3:

   * You should use two spaces after a sentence-ending period.
    --> Looks weird to me.
    (DWF: This is an old convention from the typewriter era. It has more
    or less been wiped out by HTML's convention of ignoring extra 
    whitespace: see http://en.wikipedia.org/wiki/Sentence_spacing for
    more detail. I think it's okay to drop this convention in source code.)

   * Imports should usually be on separate lines
    --> Can be a lot of lines wasted for no obvious benefit. I think this is
        mostly useful when you import different modules from different places,
        but I would say that for instance for standard modules it would be
        better to import them all on a single line (doing multiple lines only
        if there are too many of them), e.g. prefer:
            import os, sys, time
        to
            import os
            import sys
            import time
        However, I agree about separating imports between standard lib / 3rd
        party, e.g. prefer:
            import os, sys, time
            import numpy, scipy
        to
            import numpy, os, scipy, sys, time
        (Personal note: preferably order imports by alphabetical order, makes
         it easier to quickly see if a specific module is already imported,
         and avoids duplicated imports)

    * Missing in PEP 8:
        - How to indent multi-line statements? E.g. do we want
            x = my_func(a, b, c,
                d, e, f)
          or
            x = my_func(a, b, c,
                        d, e, f)
          or
            x = my_func(
                a, b, c, d, e, f)
          --> Probably depends on the specific situation, but we could have a
            few typical examples (and the same happens with multi-lines lists)
	  (Fred: I would do 2 or 3, but not 1. I find it more redable when the
	         indent is broken after a paranthesis then at any point.
      OD: After thinking about it, I agreee as well. My recommendation would
          be to go with 2 when it can fit on two lines, and 3 otherwise. Same
          with lists.

    * From PEP 257: The BDFL [3] recommends inserting a blank line between the
      last paragraph in a multi-line docstring and its closing quotes, placing
      the closing quotes on a line by themselves. This way, Emacs'
      fill-paragraph command can be used on it.
     --> I have nothing against Emacs, but this is ugly!

Documentation
-------------

How do we write doc?

Compatibility with various Python versions
------------------------------------------

    * Which Python 2.x version do we want to support?

    * Is it reasonable to have coding guidelines that would make the code as
compatible as possible with Python 3?

C coding style
--------------

We also need a c-style coding style.

Meeting 2010/09/09
------------------

   * Coding guidelines
PEP 8 & Google should be a good basis to start with.
Task: Highlight the most important points in them (OD).

   * Documentation
Use RST with Sphinx.
Task: Provide specific examples on how to document a class, method, and some
specific classes like Op (DE). Modify the theano documentation to include that.

   * Python versions to be supported
Support 2.4 (because some of the clusters are still running 2.4) and write
code that can be converted to 3.x with 2to3 in a straightforward way.
Task: Write to-do's and to-not-do's to avoid compatibility issues. (OD)
(DWF: Pauli Virtanen and others have put together extensive
documentation in the process of porting NumPy to Py3K, see his notes at
http://projects.scipy.org/numpy/browser/trunk/doc/Py3K.txt -- this is
the most complete resource for complicated combinations of Python and C).
 

   * C coding style
How to write C code (in particular for Numpy / Cuda), and how to mix C and
Python.
Task: See if there would be a sensible C code style to follow (maybe look how
Numpy does it), and how projects that mix C and Python deal with it (e.g. use
separate files, or be able to have mixed syntax highlighting?) (FB)

   * Program output
Use the warning and logging modules. Avoid print as much as possible.
Task: Look into these modules to define general guidelines e.g. to decide when
to use warning instead of logging. (DWF)

   * Automatized code verification
Use pychecker & friends to make sure everything is fine.
Task: Look into the various options available (DE)

   * Tests
Force people to write tests. Automatic email reminder of code lines not
covered by tests (see if we can get this from nosetests). Decorator to mark
some classes / methods as not being tested yet, so as to be able to
automatically warn the user when he is using untested stuff (and to remind
ourselves we should add a test).
Task: See feasibility. (OD)

   * VIM / Emacs plugins / config files
To enforce good coding style automatically.
Task: Look for existing options. (FB)
(DWF: I have put some time into this for vim, I will send around my files)

Suggestion by PV
----------------

Have a sample code that showcases everything one should comply to.

Some coding guidlines (work-in-progress from OD)
------------------------------------------------

   * Avoid using lists if all you care about is iterating on something. Using
     lists:
        - uses more memory (and possibly more CPU if the code may break out of
          the iteration)
        - can lead to ugly code when converted to Python 3 with 2to3
        - can have a different behavior if evaluating elements in the list has
          side effects (if you want these side effects, make it explicit by
          assigning the list to some variable before iterating on it)
    
    Iterative version       List version
    my_dict.iterkeys()      my_dict.keys()
    my_dict.itervalues()    my_dict.values()
    my_dict.iteritems()     my_dict.items()
    itertools.imap          map
    itertools.ifilter       filter
    itertools.izip          zip
    xrange                  range
    
    * Use `in` on container objects instead of using class-specific methods.
      It is easier to read and may allow you to use your code with different
      container types.

    Yes                         No
    ---                         --
    key in my_dict              my_dict.has_key(key)
    sub_string in my_string     my_string.find(sub_string) >= 0

    * (Point to debate) Avoid contractions in code comments (particularly in
      documentation): "We do not add blue to red because it does not look
      good" rather than "We don't add blue to red because it doesn't look
      good". I mostly find it to be cleaner (been used to it while writing
      scientific articles too).

   * (Point to debate) Imperative vs. third-person comments. I am used to the
     imperative form and like it better only because it typically saves one
     letter (the 's'): "Return the sum of elements in x" rather than
     "Returns the sum of elements in x".

    * (Point to debate) I like always doing the following when subclassing
      a class A:
        class B(A):
            def __init__(self, b_arg_1, b_arg_2, **kw):
                super(B, self).__init__(**kw)
                ...
      The point here is that the constructor always allow for extra keyword
      arguments (except for the class at the very top of the hierarchy), which
      are automatically passed to the parent class.
      Pros:
        - You do not need to repeat the parent class arguments whenever you
          write a new subclass.
        - Whenever you add an argument to the parent class, all child classes
          can benefit from it without modifying their code.
      Cons:
        - One needs to look at the parent classes to see what these arguments
          are.
        - You cannot use a **kw argument in your constructor for your own
          selfish purpose.
        - I have no clue whether one could do this with multiple inheritance.
        - More?
      Question: Should we encourage this in Pylearn?

   * Generally prefer list comprehensions to map / filter, as the former are
     easier to read.
    Yes:
        non_comments = [line.strip() for line in my_file.readlines()
                                     if not line.startswith('#')]
    No:
        non_comments = map(str.strip,
                           filter(lambda line: not line.startswith('#'),
                                  my_file.readlines()))
    
    * Use the `key` argument instead of `cmp` when sorting (for Python 3
      compatibility).
    Yes:
        my_list.sort(key=abs)
    No:
        my_list.sort(cmp=lambda x, y: cmp(abs(x), abs(y)))

    * Use // for integer division (for readability and Python 3 compatibility).
    Yes:
        n_samples_per_split = n_samples // n_splits
    No:
        n_samples_per_split = n_samples / n_splits

    * Only use ASCII characters in code files.

    * Code indent must be done with four blank characters (not with tabs).

    * Whenever you read / write binary files, specify it in the mode ('rb' for
      reading, 'wb' for writing). This is important for cross-platform and
      Python 3 compatibility (e.g. when pickling / unpickling objects).

    * Avoid tuple parameter unpacking to avoid very ugly code when converting
      to Python 3.
    Yes:
        def f(x, y_z):
            y, z = y_z
    No:
        def f(x, (y, z))

    * Only use cPickle, not pickle.

    * Always raise exception with
        raise MyException(args)
      where MyException inherits from Exception.

Mercurial commits
-----------------

   * How to write good commit messages?
   * Standardize the merge commit text (what is the message from fetch?)