Mercurial > pylearn
comparison doc/v2_planning/API_coding_style.txt @ 1331:0541e7d6e916
merge
author | gdesjardins |
---|---|
date | Thu, 14 Oct 2010 23:55:55 -0400 |
parents | 4efa2630f430 |
children | 90116fb3636b |
comparison
equal
deleted
inserted
replaced
1330:3efd0effb2a7 | 1331:0541e7d6e916 |
---|---|
257 | 257 |
258 .. code-block:: python | 258 .. code-block:: python |
259 | 259 |
260 """Module docstring as the first line, as usual.""" | 260 """Module docstring as the first line, as usual.""" |
261 | 261 |
262 __authors__ = "Olivier Delalleau, Frederic Bastien, David Warde-Farley" | 262 __authors__ = "Olivier Delalleau, Frederic Bastien, David Warde-Farley" |
263 __copyright__ = "(c) 2010, Universite de Montreal" | 263 __copyright__ = "(c) 2010, Universite de Montreal" |
264 __license__ = "3-clause BSD License" | 264 __license__ = "3-clause BSD License" |
265 __contact__ = "Name Of Current Guardian of this file <email@address>" | 265 __contact__ = "Name Of Current Guardian of this file <email@address>" |
266 | 266 |
267 * Use ``//`` for integer division and ``/ float(...)`` if you want the | 267 * Use ``//`` for integer division and ``/ float(...)`` if you want the |
268 floating point operation (for readability and compatibility across all | 268 floating point operation (for readability and compatibility across all |
269 versions of Python). | 269 versions of Python). |
270 | 270 |
316 if efficiency is typically not an issue here, the main goal being code | 316 if efficiency is typically not an issue here, the main goal being code |
317 consistency). Also, always use ``numpy.isinf`` / ``numpy.isnan`` to | 317 consistency). Also, always use ``numpy.isinf`` / ``numpy.isnan`` to |
318 test infinite / NaN values. This is important because ``numpy.nan != | 318 test infinite / NaN values. This is important because ``numpy.nan != |
319 float('nan')``. | 319 float('nan')``. |
320 | 320 |
321 * Whenever possible, mimic the numpy / scipy interfaces when writing code | |
322 similar to what can be found in these packages. | |
323 | |
321 * Avoid backslashes whenever possible. They make it more | 324 * Avoid backslashes whenever possible. They make it more |
322 difficult to edit code, and they are ugly (as well as potentially | 325 difficult to edit code, and they are ugly (as well as potentially |
323 dangerous if there are trailing white spaces). | 326 dangerous if there are trailing white spaces). |
324 | 327 |
325 .. code-block:: python | 328 .. code-block:: python |
343 * When indenting multi-line statements like lists or function arguments, | 346 * When indenting multi-line statements like lists or function arguments, |
344 keep elements of the same level aligned with each other. | 347 keep elements of the same level aligned with each other. |
345 The position of the first | 348 The position of the first |
346 element (on the same line or a new line) should be chosen depending on | 349 element (on the same line or a new line) should be chosen depending on |
347 what is easiest to read (sometimes both can be ok). | 350 what is easiest to read (sometimes both can be ok). |
351 Other formattings may be ok depending on the specific situation, use | |
352 common sense and pick whichever looks best. | |
348 | 353 |
349 .. code-block:: python | 354 .. code-block:: python |
350 | 355 |
351 # Good. | 356 # Good. |
352 for my_very_long_variable_name in [my_foo, my_bar, my_love, | 357 for my_very_long_variable_name in [my_foo, my_bar, my_love, |
472 | 477 |
473 Code Sample | 478 Code Sample |
474 =========== | 479 =========== |
475 | 480 |
476 The following code sample illustrates some of the coding guidelines one should | 481 The following code sample illustrates some of the coding guidelines one should |
477 follow in Pylearn. This is still a work-in-progress. | 482 follow in Pylearn. This is still a work-in-progress. Feel free to improve it and |
483 add more! | |
478 | 484 |
479 .. code-block:: python | 485 .. code-block:: python |
480 | 486 |
481 #! /usr/env/bin python | 487 #! /usr/env/bin python |
482 | 488 |
483 """Sample code. There may still be mistakes / missing elements.""" | 489 """Sample code. Edit it as you like!""" |
484 | 490 |
485 __authors__ = "Olivier Delalleau" | 491 __authors__ = "Olivier Delalleau" |
486 __copyright__ = "(c) 2010, Universite de Montreal" | 492 __copyright__ = "(c) 2010, Universite de Montreal" |
487 __license__ = "3-clause BSD License" | 493 __license__ = "3-clause BSD License" |
488 __contact__ = "Olivier Delalleau <delallea@iro>" | 494 __contact__ = "Olivier Delalleau <delallea@iro>" |
489 | 495 |
490 # Standard library imports are on a single line. | 496 # Standard library imports are on a single line. |
491 import os, sys, time | 497 import os, sys, time |
492 | 498 |
493 # Third-party imports come after standard library imports, and there is | 499 # Third-party imports come after standard library imports, and there is |
494 # only one import per line. Imports are sorted lexicographically. | 500 # only one import per line. Imports are sorted lexicographically. |
495 import numpy | 501 import numpy |
496 import scipy | 502 import scipy |
497 import theano | 503 import theano |
498 # Put 'from' imports below. | 504 # Individual 'from' imports come after packages. |
499 from numpy import argmax | 505 from numpy import argmax |
500 from theano import tensor | 506 from theano import tensor |
501 | 507 |
502 # Application-specific imports come last. | 508 # Application-specific imports come last. |
503 from pylearn import dataset | 509 # The absolute path should always be used. |
504 from pylearn.optimization import minimize | 510 from pylearn import datasets, learner |
505 | 511 from pylearn.formulas import noise |
506 def print_files_in(directory): | 512 |
507 """Print the first line of each file in given directory.""" | 513 |
508 # TODO To be continued... | 514 # All exceptions inherit from Exception. |
515 class PylearnError(Exception): | |
516 # TODO Write doc. | |
517 pass | |
518 | |
519 # All top-level classes inherit from object. | |
520 class StorageExample(object): | |
521 # TODO Write doc. | |
522 pass | |
523 | |
524 | |
525 # Two blank lines between definitions of top-level classes and functions. | |
526 class AwesomeLearner(learner.Learner): | |
527 # TODO Write doc. | |
528 | |
529 def __init__(self, print_fields=None): | |
530 # TODO Write doc. | |
531 # print_fields is a list of strings whose counts found in the | |
532 # training set should be printed at the end of training. If None, | |
533 # then nothing is printed. | |
534 # Do not forget to call the parent class constructor. | |
535 super(AwesomeLearner, self).__init__() | |
536 # Use None instead of an empty list as default argument to | |
537 # print_fields to avoid issues with mutable default arguments. | |
538 self.print_fields = if_none(print_fields, []) | |
539 | |
540 # One blank line between method definitions. | |
541 def add_field(self, field): | |
542 # TODO Write doc. | |
543 # Test if something belongs to a container with `in`, not | |
544 # container-specific methods like `index`. | |
545 if field in self.print_fields: | |
546 # TODO Print a warning and do nothing. | |
547 pass | |
548 else: | |
549 # This is why using [] as default to print_fields in the | |
550 # constructor would have been a bad idea. | |
551 self.print_fields.append(field) | |
552 | |
553 def train(self, dataset): | |
554 # TODO Write doc (store the mean of each field in the training | |
555 # set). | |
556 self.mean_fields = {} | |
557 count = {} | |
558 for sample_dict in dataset: | |
559 # Whenever it is enough for what you need, use iterative | |
560 # instead of list versions of dictionary methods. | |
561 for field, value in sample_dict.iteritems(): | |
562 # Keep line length to max 80 characters, using parentheses | |
563 # instead of \ to continue long lines. | |
564 self.mean_fields[field] = (self.mean_fields.get(field, 0) + | |
565 value) | |
566 count[field] = count.get(field, 0) + 1 | |
567 for field in self.mean_fields: | |
568 self.mean_fields[field] /= float(count[field]) | |
569 for field in self.print_fields: | |
570 # Test is done with `in`, not `has_key`. | |
571 if field in self.sum_fields: | |
572 # TODO Use log module instead. | |
573 print '%s: %s' % (field, self.sum_fields[field]) | |
574 else: | |
575 # TODO Print warning. | |
576 pass | |
577 | |
578 def test_error(self, dataset): | |
579 # TODO Write doc. | |
580 if not hasattr(self, 'sum_fields'): | |
581 # Exceptions should be raised as follows (in particular, no | |
582 # string exceptions!). | |
583 raise PylearnError('Cannot test a learner that was not ' | |
584 'trained.') | |
585 error = 0 | |
586 count = 0 | |
587 for sample_dict in dataset: | |
588 for field, value in sample_dict.iteritems(): | |
589 try: | |
590 # Minimize code into a try statement. | |
591 mean = self.mean_fields[field] | |
592 # Always specicy which kind of exception you are | |
593 # intercepting with except. | |
594 except KeyError: | |
595 raise PylearnError( | |
596 "Found in a test sample a field ('%s') that had " | |
597 "never been seen in the training set." % field) | |
598 error += (value - self.mean_fields[field])**2 | |
599 count += 1 | |
600 # Remember to divide by a floating point number unless you | |
601 # explicitly want an integer division (in which case you should | |
602 # use //). | |
603 mse = error / float(count) | |
604 # TODO Use log module instead. | |
605 print 'MSE: %s' % mse | |
606 return mse | |
607 | |
608 | |
609 def if_none(val_if_not_none, val_if_none): | |
610 # TODO Write doc. | |
611 if val_if_not_none is not None: | |
612 return val_if_not_none | |
613 else: | |
614 return val_if_none | |
615 | |
616 | |
617 def print_subdirs_in(directory): | |
618 # TODO Write doc. | |
619 # Using list comprehension rather than filter. | |
620 sub_dirs = sorted([d for d in os.listdir(directory) | |
621 if os.path.isdir(os.path.join(directory, d))]) | |
622 print '%s: %s' % (directory, ' '.join(sub_dirs)) | |
623 # A `for` loop is often easier to read than a call to `map`. | |
624 for d in sub_dirs: | |
625 print_subdirs_in(os.path.join(directory, d)) | |
626 | |
509 | 627 |
510 def main(): | 628 def main(): |
511 if len(sys.argv) != 2: | 629 if len(sys.argv) != 2: |
512 # Note: conventions on how to display script documentation and | 630 # Note: conventions on how to display script documentation and |
513 # parse arguments are still to-be-determined. | 631 # parse arguments are still to-be-determined. This is just one |
632 # way to do it. | |
514 print("""\ | 633 print("""\ |
515 Usage: %s <directory> | 634 Usage: %s <directory> |
516 Print first line of each file in given directory (in alphabetic order).""" | 635 For the given directory and all sub-directories found inside it, print |
636 the list of the directories they contain.""" | |
517 % os.path.basename(sys.argv[0])) | 637 % os.path.basename(sys.argv[0])) |
518 return 1 | 638 return 1 |
519 print_files_in(sys.argv[1]) | 639 print_subdirs_in(sys.argv[1]) |
520 return 0 | 640 return 0 |
641 | |
521 | 642 |
522 # Top-level executable code should be minimal. | 643 # Top-level executable code should be minimal. |
523 if __name__ == '__main__': | 644 if __name__ == '__main__': |
524 sys.exit(main()) | 645 sys.exit(main()) |
525 | 646 |
531 committed to Pylearn complies to above specifications. This work is not | 652 committed to Pylearn complies to above specifications. This work is not |
532 finalized yet, but David started a `Wiki page`_ with helpful configuration | 653 finalized yet, but David started a `Wiki page`_ with helpful configuration |
533 tips for Vim. | 654 tips for Vim. |
534 | 655 |
535 .. _Wiki page: http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Divers/VimPythonRecommendations | 656 .. _Wiki page: http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Divers/VimPythonRecommendations |
657 | |
658 Commit message | |
659 ============== | |
660 | |
661 * A one line summary. Try to keep it short, and provide the information | |
662 that seems most useful to other developers: in particular the goal of | |
663 a change is more useful than its description (which is always | |
664 available through the changeset patch log). E.g. say "Improved stability | |
665 of cost computation" rather than "Replaced log(exp(a) + exp(b)) by | |
666 a * log(1 + exp(b -a)) in cost computation". | |
667 * If needed a blank line followed by a more detailed summary | |
668 * Make a commit for each logical modification | |
669 * This makes reviews easier to do | |
670 * This makes debugging easier as we can more easily pinpoint errors in | |
671 commits with hg bisect | |
672 * NEVER commit reformatting with functionality changes | |
673 * Review your change before commiting | |
674 * "hg diff <files>..." to see the diff you have done | |
675 * "hg record" allows you to select which changes to a file should be | |
676 committed. To enable it, put into the file ~/.hgrc: | |
677 | |
678 .. code-block:: bash | |
679 | |
680 [extensions] | |
681 hgext.record= | |
682 | |
683 * hg record / diff force you to review your code, never commit without | |
684 running one of these two commands first | |
685 * Write detailed commit messages in the past tense, not present tense. | |
686 * Good: "Fixed Unicode bug in RSS API." | |
687 * Bad: "Fixes Unicode bug in RSS API." | |
688 * Bad: "Fixing Unicode bug in RSS API." | |
689 * Separate bug fixes from feature changes. | |
690 * When fixing a ticket, start the message with "Fixed #abc" | |
691 * Can make a system to change the ticket? | |
692 * When referencing a ticket, start the message with "Refs #abc" | |
693 * Can make a system to put a comment to the ticket? | |
694 | |
536 | 695 |
537 TODO | 696 TODO |
538 ==== | 697 ==== |
539 | 698 |
540 Things still missing from this document, being discussed in coding_style.txt: | 699 Things still missing from this document, being discussed in coding_style.txt: |