Coding style guide
Code reviews are a great tool to enhance code quality and stability. It works like this:
The reviewer is responsible for keeping the overall correctness and code health level of the entire system. In addition (and without contradiction), the reviewer is responsible for helping the author merge their code as quickly as possible, allowing for future iterations to happen in distinct concentrated efforts. Unmerged code is a form of technical debt and should be eradicated as quickly as possible.
Most of the time these goals do not clash. But there are some gray areas.
A good rule of thumb is that code health should never decrease, not even locally. If a commit constitutes a regression, it is the reviewer responsibility to comment and require a fix before merge, or adding a TODO
if time is of the essence and the author wishes to push code and return to the problem offline (but immediately after). On the other hand, it is perfectly fine to push code without changing code health for better or worse.
Reviewers should be wary of overextending their duties. For example, reviewers should avoid expanding the limits of the project as the author sees it, and must always look for the minimal stopping point that adheres to increasing/not-changing code health, while keeping correctness and increasing usefulness.
Avoid:
It is fine to have these comments but stress that they are optional (e.g. “consider …”).
Inconsistencies cause distraction while reading code. In reviews, they waste both reviewer and author time and effort, while they fix them. At times, people argue about the best convention.
To side step this issue entirely, autoformat everything (including configurations and markdown files).
Prefer opinionated autoformatters which rewrite the code from its abstract syntax tree. Automatic consistency implies removing redundnat degrees of freedom.
Motivations:
Therefore perhaps the only objective metric by which a design can be examined is code length.
So - write less code, delete code often.
Some forms of duplication are easy to spot, simply by the noisy feeling you get reading them.
No:
f = gamla.remove(
gamla.anyjuxt(
gamla.equals("hello"),
gamla.equals("hi"),
gamla.equals("what's up"),
gamla.equals("good afternoon"),
gamla.equals("morning"),
gamla.equals("good evening"),
)
)
Yes:
f = gamla.remove(
gamla.contains(
{
"hello",
"hi",
"what's up",
"good afternoon",
"morning",
"good evening",
}
)
)
Humans are not so good at counting brackets (as some linguists might believe). Help them avoid it by flattening nested structures.
No:
result = tuple(map(increment, [1,2,3])))
Yes:
result = gamla.pipe(
[1,2,3],
gamla.map(increment),
tuple
)
Try and minimize transitions between the multiple values space and the single value space.
No:
result = compose_left(map(f1), map(f2), map(f3))
Yes:
result = map(compose_left(f1, f2, f3))
This will simplify the code, and remove cognitive overhead when writing it, as it’s easier to hold in mind one object than a plurality.
Point free (aka tacit) programming is a way to avoid naming-noise and signature duplication.
Some names don’t really add value or readability, but we have to write them because we decided to work in the value level, instead of the function level. When this happens, it is a sign of working in the wrong level of abstraction.
No:
(x for x in people if my_filter(x))
x
is a meaningless name.
No:
(person for person in people if my_filter(x))
Even if we would have written person
- we are just creating more visual noise, by repeating this information 3 times (twice for person
and another time in people
).
Yes:
filter(my_filter, people)
Here we imply the idea cleanly with considerably less noise.
One technique to move from working in the value level to the function level is to remove variables which appear on the ends of the implementation. This is known as “eta reduction”.
For example, when you have a pipeline which needs an input from the top, you don’t need a function.
No:
def my_pipeline(x):
return gamla.pipe(x, f, g)
Yes:
my_pipeline = gamla.compose_left(f, g)
Another example is when you use some of the input on top and others in the bottom. In these cases you can simplify the signature by returning a function. Then you can avoid redundant currying and arguments by making point full curried functions into point free uncurried higher order functions.
No:
@gamla.curry # Redundant currying.
def my_func(x, y, z): # Redundant argument, duplicated with the signature of `run`.
return gamla.pipe(z, run(x), run_also(y)) # Redundant pipe stage.
gamla.pipe(
...
my_func(x, y), # Getting `z` through the pipe.
)
Yes:
def my_func(x, y): # Simplified signature, removed currying.
return gamla.compose_left(run(x), run_also(y)) # Removed one stage.
gamla.pipe(
...
my_func(x, y) # Applying the resulting function on the `pipe` input without naming.
)
One example for this is overusing functional conditionals with constant values.
When you see when
/unless
/ternary
+always
/just
, it means we’re constructing the pipeline mid-running it, which is a kind of a level mixing. If you have a constant value which determines how the pipeline should work, it should usually be read and used prior to the running time of the pipeline. This is similar to factoring out values from a loop when it never changes.
No:
my_function = gamla.ternary(gamla.just(my_var), f, g)
Yes:
my_function = f if my_var else g
Rich Hickey explains it well in his “The Value of Values” short talk: https://www.youtube.com/watch?v=-6BsiVyC1kM.
Mutable objects are hard to reason about, but when they’re global it’s even worse (what code changes them? when?).
Most developers know that. But most developers don’t think of cache decoration as a global mutable thing, while it definitely is. Global caches make the code “sometimes” faster, who knows when and why. They make optimizing code much more difficult than it is, and require developers to reload the code often to get a “clean slate”. They make writing latency tests especially hard and require mocking. They are also never garbage collected, which can lead to a nontrivial memory footprint which is hard to track.
For these reasons, avoid global caching altogether. Instead use local caching patterns, e.g. dictionaries or injected computation inside garbage collected scopes (functions).
No:
import functools
@functools.lru_cache # I am a global waiting to cause harm!
def something_slow(x):
...
@gamla.curry
def something_that_is_called_a_lot(x, element):
return something_slow(x) + element
def loop(x, elements):
return tuple(map(something_that_is_called_a_lot(x), elements))
We don’t want to combine the functions, but we also don’t want to call something_slow
more than we need to. We can factor outside to have our cake and eat it too.
Yes:
import functools
def something_slow(x):
...
@gamla.curry
def something_that_is_called_a_lot(slow_result, element):
return slow_result + element
def loop(x, elements):
slow_result = something_slow(x)
return tuple(map(something_that_is_called_a_lot(slow_result), elements))
Programming languages often introduce features to allow for multiple ways to use a function.
The usual suspects here are optionality/nullability and default arguments.
No:
@gamla.curry
def do_something(x, y=None):
if y is not None: # Hmm apparently `y` can be `None`. Let's result to defensive programming.
...
This function can be called in many different ways:
# Someone writing this may expect two different outcomes:
do_something(x) # I mean use `y` as `None`...
do_something(x) # I mean return a partially applied function that can get `y` as well...
# These are all the same:
do_something(y=y, x=x)
do_something(x=x, y=y)
do_something(x, y)
So suddenly default arguments and currying together can have surprising results.
These language features things, while making it easier to write code, make it harder to reason about it, or refactor it.
Default arguments are especially dangerous as they “help” developers create signatures with many arguments.
No:
def pizza_spec(size, topping, part="entire-pizza", crust=None, beverage=None):
...
Defaults make it hard to reason about the code (when is this argument given? when is it not? do I have to give argument topping alongside part
?).
This variation and implicit requirements make refactoring difficult, because not all call sites look the same.
When refactoring, it helps if it’s very clear how a function is used, otherwise one would have to consider all possible cases, and this number grows exponentially with the variability of arguments, forcing the coder to go over all cases and divide them.
Consequently - never use default arguments. They are not conducive to healthy code.
To solve the common case where you have many callers needing some default behaviour and one caller that needs to configure something in a different way, make two functions and use currying to avoid signature duplication.
No:
def calculate(x, addition, multiplication=1):
.return (x + addition) * multiplication
Better:
@gamla.curry
def calculate(multiplication, addition, x): # Note the new variable ordering.
...
addition = calculate(1) # Users not giving `multiplication` will use this function.
This way we know that users which do not give multiplication
, and when refactoring we can address them as a separate group.
But really the best way is to just separate the function, and get smaller signatures.
Best:
def multiplication(x, y):
return x * y
def addition(x, y):
return x + y
In pizza world, you can split the large function into smaller ones:
def topping_request(topping, part):
...
combine_requirments = gamla.compose_left(gamla.mapcat(dict.items), dict)
order = combine_requirments(topping_spec('mushrooms', 'all'), size_spec('medium'))
If no crust information is needed, then it’s just not there. It’s also now obvious that part
refers to the topping.
For reuse and readability, a function’s name or description should only refer to the code it has, not to the context where the function would be used.
No:
def make_price_slightly_higher(price: int) -> int:
return price + 1
Yes:
def increment(n: int) -> int:
return n + 1
Same goes for filenames. If you have a directory dealing with something, and internal files with specifics for that something, refrain from repeating the directory name in the internal filenames.
No:
fruit/fruit_apple.py
fruit/fruit_tomato.py
Yes:
fruit/apple.py
fruit/tomato.py
Large and sophisticated blocks of code should be under scrutiny:
If the answer to one of these is yes, then it’s time to factor out, and use composition instead of nesting.
Here are some concrete examples.
If a function starts with some logic of when the core logic should occur, factor this logic out.
No:
def f(a, b, c):
if check(a):
return
... # Rest of logic
f(a, b, c)
f
is aware of check
.
Yes:
def f(a, b, c):
... # Rest of logic
if check(a):
f(a, b, c)
f
and check
are oblivious to each other, becoming simpler.
Remove all code which is not connected to the application. This includes “preparing for the future” or “this might be useuful again” patterns (odds of correctly predicting what will be needed or not are slim).
Similarly, debugging functions belong in external libraries.
Factor out and name functions when one of these conditions apply:
Avoid:
Naming functions where the name is of similar length to the implementation.
No:
filter_larger_than_1 = gamla.filter(operator.lt(1))
Repeating the filename when only one target is exported.
No:
const f = () => true;
export default f;
(In f.js
)
Yes:
export default () => true;
(In f.js
)
Non-unary functions are commonly used in compositions using currying. To make this experience as nice as possible, it is important that the argument order matches the expected calling sequence.
When unsure about usage, a good rule of thumb is - the more fundamental to the function operation an argument is, the earlier it should be in the signature. This is why higher order functions like map
and filter
always have the function as their first parameter.
No:
@gamla.curry
def foo(x, y):
...
gamla.compose_left(foo(y=2), bar)
Yes:
@gamla.curry
def foo(y, x): # Note the argument order has changed.
...
gamla.compose_left(foo(2), bar)
Utils is a lazy naming convention, and becomes meaningless really fast, creating non trivial dependencies across the project. If something is really a util it belongs in gamla
or any other open source lib, not in a normal repo.
pipe
sNo:
gamla.pipe(x, f)
yes:
f(x)
However if f
is an expression with brackets, then a pipe
can make it nicer:
No:
expression_with_brackets(and_nesting)(x)
Yes:
gamla.pipe(
x,
expression_with_brackets(and_nesting),
)
black
formatter without any configurations._like_so
.set
, list
, etc’. Prefer their immutable alternatives - frozenset
, tuple
. In the case of dict
there is no good alternative, so no other choice. This does not apply to unassigned values such as anonymous literals. E.g. it’s fine to do print([1,2,3])
, or return a mutable datatype within a pipeline that does not expose it.defaultdict
- it encourages mutation patterns instead of creating a new dict
.frozenset
.Optional
and Union
as they increase ambiguity.Optional
typing for a function argument does not imply it is a default variable. This is a common mistake. Optional[x]
is equivalent to Union[None, x]
.X
is written like so: Tuple[X, ...]
(note the ...
, this is different from other containers), while a typing for a tuple with one element would be Tuple[X]
.from my_module import MyClass
, instead do: import my_module
. There is one exception to this rule which is the typing
module.asyncio
rather than threads or processes, because of the GIL and other overheads. If you need multiprocessing, factor it out to a service.prettier
formatter without any configurations.const
to declare new variables. Use let
if you must. Never use var
.React
, do not use the Component
class, instead use functional components.const f = x => {return x+1};
do const f = x => x+1;
to get a better signal to noise ratio, especially when reading functional components.export default
to avoid causing a duplication between a filename and its contents.const f = ({arg1, arg2, arg3}) => {...}
instead of const f = (arg1, arg2, arg3) => {...}
.