Monday, September 5, 2011

Python is NOT Java

Este post de PJ Eby me pareció genial y resume muchas de las diferencias importantes entre Java y Python.

Python Is Not Java
I was recently looking at the source of a wxPython-based GUI application, about 45.5KLOC in size, not counting the libraries used (e.g. Twisted). The code was written by Java developers who are relatively new to Python, and it suffers from some performance issues (like a 30-second startup time). In examining the code, I found that they had done lots of things that make sense in Java, but which suck terribly in Python. Not because "Python is slower than Java", but because there are easier ways to accomplish the same goals in Python, that wouldn't even be possible in Java.

So, the sad thing is that these poor folks worked much, much harder than they needed to, in order to produce much more code than they needed to write, that then performs much more slowly than the equivalent idiomatic Python would. Some examples:
A static method in Java does not translate to a Python classmethod. Oh sure, it results in more or less the same effect, but the goal of a classmethod is actually to do something that's usually not even possible in Java (like inheriting a non-default constructor). The idiomatic translation of a Java static method is usually a module-level function, not a classmethod or staticmethod. (And static final fields should translate to module-level constants.)

This isn't much of a performance issue, but a Python programmer who has to work with Java-idiom code like this will be rather irritated by typing Foo.Foo.someMethod when it should just be Foo.someFunction. But do note that calling a classmethod involves an additional memory allocation that calling a staticmethod or function does not.

Oh, and all those Foo.Bar.Baz attribute chains don't come for free, either. In Java, those dotted names are looked up by the compiler, so at runtime it really doesn't matter how many of them you have. In Python, the lookups occur at runtime, so each dot counts. (Remember that in Python, "Flat is better than nested", although it's more related to "Readability counts" and "Simple is better than complex," than to being about performance.)

Got a switch statement? The Python translation is a hash table, not a bunch of if-then statments. Got a bunch of if-then's that wouldn't be a switch statement in Java because strings are involved? It's still a hash table. The CPython dictionary implementation uses one of the most highly-tuned hashtable implementations in the known universe. No code that you write yourself is going to work better, unless you're the genetically-enhanced love child of Guido, Tim Peters, and Raymond Hettinger.

XML is not the answer. It is not even the question. To paraphrase Jamie Zawinski on regular expressions, "Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems."

This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding". In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)

If you are a Java programmer, do not trust your instincts regarding whether you should use XML as part of your core application in Python. If you're not implementing an existing XML standard for interoperability reasons, creating some kind of import/export format, or creating some kind of XML editor or processing tool, then Just Don't Do It. At all. Ever. Not even just this once. Don't even think about it. Drop that schema and put your hands in the air, now! If your application or platform will be used by Python developers, they will only thank you for not adding the burden of using XML to their workload.

(The only exception to this is if your target audience really really needs XML for some strange reason. Like, they refuse to learn Python and will only pay you if you use XML, or if you plan to give them a nice GUI for editing the XML, and the GUI in question is something that somebody else wrote for editing XML and you get to use it for free. There are also other, very rare, architectural reasons to need XML. Trust me, they don't apply to your app. If in doubt, explain your use case for XML to an experienced Python developer. Or, if you have a thick skin and don't mind being laughed at, try explaining to a Lisp programmer why your application needs XML!)

Getters and setters are evil. Evil, evil, I say! Python objects are not Java beans. Do not write getters and setters. This is what the 'property' built-in is for. And do not take that to mean that you should write getters and setters, and then wrap them in 'property'. That means that until you prove that you need anything more than a simple attribute access, don't write getters and setters. They are a waste of CPU time, but more important, they are a waste of programmer time. Not just for the people writing the code and tests, but for the people who have to read and understand them as well.

In Java, you have to use getters and setters because using public fields gives you no opportunity to go back and change your mind later to using getters and setters. So in Java, you might as well get the chore out of the way up front. In Python, this is silly, because you can start with a normal attribute and change your mind at any time, without affecting any clients of the class. So, don't write getters and setters.

Code duplication is quite often a necessary evil in Java, where you must often write the same method over and over with minor variations (usually because of static typing constraints). It is not necessary or desirable to do this in Python (except in certain rare cases of inlining a few performance-critical functions). If you find yourself writing the same function over and over again with minor variations, it's time to learn about closures. They're really not that scary.

Here's what you do. You write a function that contains a function. The inner function is a template for the functions that you're writing over and over again, but with variables in it for all the things that vary from one case of the function to the next. The outer function takes parameters that have the same names as those variables, and returns the inner function. Then, every place where you'd otherwise be writing yet another function, simply call the outer function, and assign the return value to the name you want the "duplicated" function to appear. Now, if you need to change how the pattern works, you only have to change it in one place: the template.

In the application/platform I looked at, just one highly trivial application of this technique could have cut out hundreds of lines of deadweight code. Actually, since the particular boilerplate has to be used by developers developing plugins for the platform, it will save many, many more hundreds of lines of third-party developer code, while simplifying what those developers have to learn.
This is only the tip of the iceberg for Java->Python mindset migration, and about all I can get into right now without delving into an application's specifics. Essentially, if you've been using Java for a while and are new to Python, do not trust your instincts. Your instincts are tuned to Java, not Python. Take a step back, and above all, stop writing so much code.

To do this, become more demanding of Python. Pretend that Python is a magic wand that will miraculously do whatever you want without you needing to lifting a finger. Ask, "how does Python already solve my problem?" and "What Python language feature most resembles my problem?" You will be absolutely astonished at how often it happens that thing you need is already there in some form. In fact, this phenomenon is so common, even among experienced Python programmers, that the Python community has a name for it. We call it "Guido's time machine", because sometimes it seems as though that's the only way he could've known what we needed, before we knew it ourselves.

So, if you don't feel like you're at least ten times more productive with Python than Java, chances are good that you've been forgetting to use the time machine! (And if you miss your Java IDE, consider the possibility that it's because your Python program is much more complex than it needs to be.)

Resolución de problemas a través de búsqueda

Agentes 

Los agentes más sencillos son los agentes reflejos, que basan sus acciones en un mapeo directo de estados a acciones.

Los agentes reflejos no pueden operar bien en ambientes en los cuales el mapeo de relaciones estado->acción es demasiado grande y toma demasiado tiempo aprender las relaciones.

Los agentes basados en metas consideran acciones futuras y el nivel en que los resultados de esas acciones resultan deseables.

Los agentes resolvedores de problemas usan representaciones atómicas - esto significa que los estados del mundo son considerados unidades enteras, cajas negras que no tienen estructura interna visible para los algoritmos resolvedores de problemas.

Para discutir la resolución de problemas es necesario definir con precisión a los problemas y sus soluciones. Existen varios algoritmos de búsqueda de propósito general que pueden ser usados para resolver problemas.

Algoritmos de Búsqueda Informados y No Informados

Los algoritmos de búsqueda no informados son aquellos que sólo reciben como información del problema su definición. Los algoritmos de búsqueda no informados pueden resolver cualquier problema que sea resoluble, pero ninguno de ellos puede hacerlo eficientemente, operan a manera de la fuerza bruta. Los algoritmos de búsqueda en profundidad (DFS - depth first search) y búsqueda en anchura (BFS - breadth first search) son algoritmos de búsqueda no informados, al igual que la búsqueda de costo uniforme (UCS - uniform cost search). Un algoritmo no informado sólo es capaz de distinguir entre un estado que sea meta y aquellos que no lo son. Un algoritmo no informado genera sucesiones de estados sin distinguir cuál de los siguientes estados es más prometedor para alcanzar la solución final.

Los algoritmos de búsqueda informada reciben cierta guía para saber dónde empezar a buscar soluciones, lo que, en muchos casos, aumenta la eficiencia en tiempo de cómputo. Estas guías son a menudo heurísticas que son derivadas de la observación cuidadosa del problema y tienden a ser específicas a él. El algoritmo A* ("A star" o "A estrella") es un algoritmo de búsqueda informada de caminos que comúnmente se implementa usando la heurística de Distancia Manhattan. Un algoritmo informado genera sucesiones a los estados en base a un criterio que determina cuales estados son más prometedores para alcanzar la solución final.

Fuente: AIMA, Capítulo 3.