view doc/v2_planning/arch_src/plugin_JB_comments_IG.txt @ 1419:cff305ad9f60

TensorFnDataset - added x_ attribute that caches the dataset function return value, but does not get pickled.
author James Bergstra <bergstrj@iro.umontreal.ca>
date Fri, 04 Feb 2011 16:05:22 -0500
parents 16919775479c
children
line wrap: on
line source

-Does everything have to be all caps? I know I will get annoyed with that.
 - JB replies: I chose caps because 

   a) I wanted to be able to use statements like IF, WHILE, etc. that are
   reserved words in Python in lower case... but this turned out not to be a
   large overlap

   b) I wanted to make up for the lack of syntax highlighting of control flow
   statements in VIM by making the words bigger.

   c) I thought it looked kinda retro-cool.

   Neither of these reasons is really strong, if you or others have strong
   feelings against caps then no problem.

-Regarding overall program structure:
 Do you think there might be an easier to read/type way of specifying programs than building them out of constructors? This seems like it's going to lead to unwieldy proliferation of parentheses, like in LISP, but since it's an imperative language it's more likely that we'll have lots of different scopes visible at the same time, and it will be hard to tell which section is nested inside which other section if they're all just a bunch of constructor calls fed to each other.
 Right now it just seems to take a few layers of SWITCH and SEQ to end up with an unreadable mess:
I know I'm not getting the syntax exactly matched to you proposal, but just to illustrate what I'm saying, we could have a program that looks like this:
program = SEQ( A, B, SWITCH(var1, val1_1, C, val_1_2, SWITCH(var2, val_2_1, D, val_2_2, E) ) , F, SWITCH(var3, val_3_1, G, val_3_2, H) )
This seems like it could quickly turn into a nightmare, trying to count parentheses everywhere. An alternative to make it more parseable is:

switch1 = SWITCH(var2, val_2_1, D, val_2_2, E)
switch2 = SWITCH(var1, val1_1, C, val_1_2,  switch1)
switch3 = SWITCH(var3, val_3_1, G, val_3_2, H)
program = SEQ( A, B, switch2  , F,  switch3)

This is a lot more manageable but now the parts are out of order, so the cognitive load required to debug and understand it doesn't scale well with program size.

It would be much nicer if, since it is a programming language, we could write:

A
B
SWITCH var1
  val1, C
  val_1_2, SWITCH var2
     val_2_1, D
     val_2_2, E
  F
  SWITCH var3
   val_3_1, G
   val_3_2, H

I can see a few different ways of accomplishing this, but of couse welcome more proposals:

1) Make a scripting language, so we pass a file into our library. We could base it on XML, maybe, if we didn't want to spend too much time making our own parser:

switch.xml contains:
<PyLearn>
<A />
<B />
<Switch var="var_1">
  <Branch val="val_1_1">
     <C>
  </Branch>
  ...
</Switch>
</Pylearn>

python pylearn.py switch.xml

2) We could make a global program compiler or have program objects that have an idea of the current scope, that you just add things to:

p = pylearn.program()
p(A)
p(B)
p(SWITCH(var1))
p(  Branch(val_1_1, C))  #one annoying thing is python wouldn't let us indent things as we please
...
p(END_SWITCH)
...
p.compile()
p.run()

If we design our language to be LL(1) (fairly easy to do) then it's pretty easy to make p check that the calls to it are syntactically correct as they happen.


JB replies: 

  What I've proposed so far is a few classes for adding new program-flow
  constructs to Python, which is I think less ambitious and more desirable than
  defining a separate language.  For example, the bodies of the CALL objects are
  all python methods (not implemented in my - i hesitate to all it a -
  "language") and the program itself is constructed using *python* control flow
  and *python* methods.  I don't want to have another set of syntax rules, or
  have to create a macro system, or a pre-processor.  
  
  I agree it would be nicer to have a more elegant syntax, but I'd much rather
  live with a few extra parentheses than require someone to go to the trouble of
  implementing that luxury.  (And we can tweak the control-flow constructors to
  minimize the number of brackets & parentheses too).

  Besides, we can always implement that language & compiler later. For now we
  can just type the extra brackets.

  Perhaps I don't understand your first example - where do the definitions of A
  and B come from?  Must they be in the same file higher up or something? In
  what language will they be defined?


IG replies:

  Doesn't your proposal refer to what is being created as an "imperative
language"?
  A, B, etc. are just placeholders for whatever kind of statement you want to
fill in-- CALL, FILTER, etc. What I wrote wasn't meant to be a real program,
just an example of how tree-structured programs get mapped into text.
 The main reason to bring up the issue of a scripting language for assembling
these constructors is we need to make sure that the set of optional arguments
to each constructors is such that the scripting language built on top of them
is LL(1). Fortunately, that is not very hard. When we start converging on the
final interface I can do the check myself.

JB replies: 

  I don't know if this is standard - but I think of a language as being ...
  maybe... "the set of all syntactic elements which which you make a program",
  and by that criterion I am not proposing a complete language.  I attempt a
  definition because these control-flow programs are expressed in ELEMENTs that
  eventually bottom out at CALL(X, ...) where X is *not* defined in the
  control-flow language. X is a Python function that typically (and maybe
  necessarily - I'm not sure) knows nothing about the control-flow language that
  is calling it.  So with this view in mind, I can't understand why or how it
  would make sense to define the control flow in a new language, or in XML or
  something.  After all, how are you going to tell it what to CALL?