"The ratio of time spent on reading vs writing code is well over 10:1. We are constantly reading old code as part of the effort to write new code. Making code easier to read makes it easier to write" [ref bob martin]
The boy scout rule:
- if we all checked in code a little cleaner than when we checked it out, the code can simply not rot.
- Can you imagine working on a project where the code got better as time went on? :mindblown:
Inspired from clean-code-javascript
Targets Python3.7+
- Use intention revealing names and don't be afraid to change them if you find better names
- Why does it exist? How is it used?
- If a name requires a comment it does not reveal it's intention
Bad:
d: int# elapsed time in daysGood:
elapsed_time_in_days: intBe explicit with your variable names
defget_them(input_list): output= [] forxinintput_list: ifx==4output.append(x) returnoutputWhat sorts of things are implied by this code?
- What kinds of things are in
input_list? - What is the significance of the value 4?
- How would I use the list being returned?
better:
FLAGGED=4defget_flagged_cell(cells): flagged_cells= [] forcellincells: ifcell==FLAGGED: flagged_cells.append(x) returnflagged_cellsbest: (sidenote about list comprehensions)
defget_flagged_cell(cells): return [cellforcellincellsifcell==FLAGGED]Notice how much more EXPLICIT the code is in the improved examples
Bad:
importdatetimeymdstr=datetime.date.today().strftime("%y-%m-%d")How do I pronounce this? "yem dee string"? Fun or not, we're tolerating poor naming here. New developers will have to have this variable explained to them. Use proper english terms even if the term is a little longer. Let autocomplete save the keystrokes for you.
Good:
importdatetimecurrent_date: str=datetime.date.today().strftime("%y-%m-%d")Bad: Here we use three different names for the same underlying entity:
defget_user_info(): passdefget_client_data(): passdefget_customer_record(): passGood: If the entity is the same, you should be consistent in referring to it in your functions:
defget_user_info(): passdefget_user_data(): passdefget_user_record(): passEven better Python is (also) an object oriented programming language. If it makes sense, package the functions together with the concrete implementation of the entity in your code, as instance attributes, property methods, or methods:
fromtypingimportUnion, Dict, TextclassRecord: passclassUser: info : str@propertydefdata(self) ->Dict[Text, Text]: return{} defget_record(self) ->Union[Record, None]: returnRecord() You want to avoid adding noise to your variable names. Noise words add "meaningless" distinction. In the above example info and data are indistinct noise words. How do programmers know which function to call? Distinguish names so the reader knows the difference
We will read more code than we will ever write. It"s important that the code we do write is readable and searchable. By not naming variables that end up being meaningful for understanding our program, we hurt our readers. Make your names searchable.
Bad:
importtime# What is the number 86400 for again?time.sleep(86400)Good:
importtime# Declare them in the global namespace for the module.SECONDS_IN_A_DAY=60*60*24time.sleep(SECONDS_IN_A_DAY)Single letter variable names should almost always be avoided. Personally, I like the length of the name to correspond to the size of its scope. If a variable or constant is used in multiple places/bodies of code it is imperative that you give it a search friendly name.
In modern python we have a rich type annotation system and we can use type checkers to enforce these types, if you want to add them. What you want to avoid is adding type information to the variable name. Given that python is also a dynamic language this may also lead to disinformation.
Bad
car_dict={"make": "tesla"}Better:
car={"make": "tesla"}Bad:
importreaddress="One Infinite Loop, Cupertino 95014"city_zip_code_regex=r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"matches=re.match(city_zip_code_regex, address) ifmatches: print(f"{matches[1]}: {matches[2]}")Not bad:
It"s better, but we are still heavily dependent on regex.
importreaddress="One Infinite Loop, Cupertino 95014"city_zip_code_regex=r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"matches=re.match(city_zip_code_regex, address) ifmatches: city, zip_code=matches.groups() print(f"{city}: {zip_code}")Good:
Decrease dependence on regex by naming subpatterns.
importreaddress="One Infinite Loop, Cupertino 95014"city_zip_code_regex=r"^[^,\\]+[,\\\s]+(?P<city>.+?)\s*(?P<zip_code>\d{5})?$"matches=re.match(city_zip_code_regex, address) ifmatches: print(f"{matches['city']}, {matches['zip_code']}")Don’t force the reader of your code to translate what the variable means. Explicit is better than implicit.
Bad:
seq= ("Austin", "New York", "San Francisco") foriteminseq: #do_stuff()#do_some_other_stuff()# Wait, what's `item` again?print(item)Good:
locations= ("Austin", "New York", "San Francisco") forlocationinlocations: #do_stuff()#do_some_other_stuff()# ...print(location)This avoids adding disinformation. Do not refer to a a grouping of accounts as an accounts_list unless your stakeholders say it's a list. Try to use vocabulary from the business domain if your class/object name tells you something, don't repeat that in your variable name.
Bad:
classCar: car_make: strcar_model: strcar_color: strGood:
classCar: make: strmodel: strcolor: strPick one word for an abstract concept and stick with it. For example, it's confusing to have fetch, retrieve, get as equivalent methods on different classes. How do you remember which name goes on each class?
"A consistent lexicon is a great boon to the programmers who must use your code"
Tricky
Why write:
importhashlibdefcreate_micro_brewery(name): name="Hipster Brew Co."ifnameisNoneelsenameslug=hashlib.sha1(name.encode()).hexdigest() # etc.... when you can specify a default argument instead? This also makes it clear that you are expecting a string as the argument.
Good:
fromtypingimportTextimporthashlibdefcreate_micro_brewery(name: Text="Hipster Brew Co."): slug=hashlib.sha1(name.encode()).hexdigest() # etc.Limiting the amount of function parameters is incredibly important because it makes testing your function easier. Having more than three leads to a combinatorial explosion where you have to test tons of different cases with each separate argument.
Zero arguments is the ideal case. One or two arguments is ok, and three should be avoided. Anything more than that should be consolidated. Usually, if you have more than two arguments then your function is trying to do too much. In cases where it"s not, most of the time a higher-level object will suffice as an argument.
Bad:
defcreate_menu(title, body, button_text, cancellable): passJava-esque:
classMenu: def__init__(self, config: dict): self.title=config["title"] self.body=config["body"] # ...menu=Menu({"title": "My Menu", "body": "Something about my menu", "button_text": "OK", "cancellable": False } )Also good
fromtypingimportTextclassMenuConfig: """A configuration for the Menu. Attributes: title: The title of the Menu. body: The body of the Menu. button_text: The text for the button label. cancellable: Can it be cancelled? """title: Textbody: Textbutton_text: Textcancellable: bool=Falsedefcreate_menu(config: MenuConfig) ->None: title=config.titlebody=config.body# ...config=MenuConfig() config.title="My delicious menu"config.body="A description of the various items on the menu"config.button_text="Order now!"# The instance attribute overrides the default class attribute.config.cancellable=Truecreate_menu(config)Fancy
fromtypingimportNamedTupleclassMenuConfig(NamedTuple): """A configuration for the Menu. Attributes: title: The title of the Menu. body: The body of the Menu. button_text: The text for the button label. cancellable: Can it be cancelled? """title: strbody: strbutton_text: strcancellable: bool=Falsedefcreate_menu(config: MenuConfig): title, body, button_text, cancellable=config# ...create_menu( MenuConfig( title="My delicious menu", body="A description of the various items on the menu", button_text="Order now!" ) )Even fancier
fromtypingimportTextfromdataclassesimportastuple, dataclass@dataclassclassMenuConfig: """A configuration for the Menu. Attributes: title: The title of the Menu. body: The body of the Menu. button_text: The text for the button label. cancellable: Can it be cancelled? """title: Textbody: Textbutton_text: Textcancellable: bool=Falsedefcreate_menu(config: MenuConfig): title, body, button_text, cancellable=astuple(config) # ...create_menu( MenuConfig( title="My delicious menu", body="A description of the various items on the menu", button_text="Order now!" ) )Even fancier, Python3.8+ only
fromtypingimportTypedDict, TextclassMenuConfig(TypedDict): """A configuration for the Menu. Attributes: title: The title of the Menu. body: The body of the Menu. button_text: The text for the button label. cancellable: Can it be cancelled? """title: Textbody: Textbutton_text: Textcancellable: booldefcreate_menu(config: MenuConfig): title=config["title"] # ...create_menu( # You need to supply all the parametersMenuConfig( title="My delicious menu", body="A description of the various items on the menu", button_text="Order now!", cancellable=True ) )This is by far the most important rule in software engineering. When functions do more than one thing, they are harder to compose, test, and reason about. When you can isolate a function to just one action, they can be refactored easily and your code will read much cleaner. If you take nothing else away from this guide other than this, you"ll be ahead of many developers.
Bad:
fromtypingimportListclassClient: active: booldefemail(client: Client) ->None: passdefemail_clients(clients: List[Client]) ->None: """Filter active clients and send them an email. """forclientinclients: ifclient.active: email(client)Good:
fromtypingimportListclassClient: active: booldefemail(client: Client) ->None: passdefget_active_clients(clients: List[Client]) ->List[Client]: """Filter active clients. """return [clientforclientinclientsifclient.active] defemail_clients(clients: List[Client]) ->None: """Send an email to a given list of clients. """forclientinget_active_clients(clients): email(client)Do you see an opportunity for using generators now?
Even better
fromtypingimportGenerator, IteratorclassClient: active: booldefemail(client: Client): passdefactive_clients(clients: Iterator[Client]) ->Generator[Client, None, None]: """Only active clients"""return (clientforclientinclientsifclient.active) defemail_client(clients: Iterator[Client]) ->None: """Send an email to a given list of clients. """forclientinactive_clients(clients): email(client)Bad:
classEmail: defhandle(self) ->None: passmessage=Email() # What is this supposed to do again?message.handle()Good:
classEmail: defsend(self) ->None: """Send this message"""message=Email() message.send()When you have more than one level of abstraction, your function is usually doing too much. Splitting up functions leads to reusability and easier testing.
Bad:
# type: ignoredefparse_better_js_alternative(code: str) ->None: regexes= [ # ... ] statements=code.split('\n') tokens= [] forregexinregexes: forstatementinstatements: passast= [] fortokenintokens: passfornodeinast: passGood:
fromtypingimportTuple, List, Text, DictREGEXES: Tuple= ( # ... ) defparse_better_js_alternative(code: Text) ->None: tokens: List=tokenize(code) syntax_tree: List=parse(tokens) fornodeinsyntax_tree: passdeftokenize(code: Text) ->List: statements=code.split() tokens: List[Dict] = [] forregexinREGEXES: forstatementinstatements: passreturntokensdefparse(tokens: List) ->List: syntax_tree: List[Dict] = [] fortokenintokens: passreturnsyntax_treeFlags tell your user that this function does more than one thing. Functions should do one thing. Split your functions if they are following different code paths based on a boolean.
Bad:
fromtypingimportTextfromtempfileimportgettempdirfrompathlibimportPathdefcreate_file(name: Text, temp: bool) ->None: iftemp: (Path(gettempdir()) /name).touch() else: Path(name).touch()Good:
fromtypingimportTextfromtempfileimportgettempdirfrompathlibimportPathdefcreate_file(name: Text) ->None: Path(name).touch() defcreate_temp_file(name: Text) ->None: (Path(gettempdir()) /name).touch()A function produces a side effect if it does anything other than take a value in and return another value or values. For example, a side effect could be writing to a file, modifying some global variable, or accidentally wiring all your money to a stranger.
Now, you do need to have side effects in a program on occasion - for example, like in the previous example, you might need to write to a file. In these cases, you should centralize and indicate where you are incorporating side effects. Don"t have several functions and classes that write to a particular file - rather, have one (and only one) service that does it.
The main point is to avoid common pitfalls like sharing state between objects without any structure, using mutable data types that can be written to by anything, or using an instance of a class, and not centralizing where your side effects occur. If you can do this, you will be happier than the vast majority of other programmers.
Bad:
# type: ignore# This is a module-level name.# It"s good practice to define these as immutable values, such as a string.# However...fullname="Ryan McDermott"defsplit_into_first_and_last_name() ->None: # The use of the global keyword here is changing the meaning of the# the following line. This function is now mutating the module-level# state and introducing a side-effect!globalfullnamefullname=fullname.split() split_into_first_and_last_name() # MyPy will spot the problem, complaining about 'Incompatible types in # assignment: (expression has type "List[str]", variable has type "str")'print(fullname) # ["Ryan", "McDermott"]# OK. It worked the first time, but what will happen if we call the# function again?Good:
fromtypingimportList, AnyStrdefsplit_into_first_and_last_name(name: AnyStr) ->List[AnyStr]: returnname.split() fullname="Ryan McDermott"name, surname=split_into_first_and_last_name(fullname) print(name, surname) # => Ryan McDermottAlso good
fromtypingimportTextfromdataclassesimportdataclass@dataclassclassPerson: name: Text@propertydefname_as_first_and_last(self) ->list: returnself.name.split() # The reason why we create instances of classes is to manage state!person=Person("Ryan McDermott") print(person.name) # => "Ryan McDermott"print(person.name_as_first_and_last) # => ["Ryan", "McDermott"]In Python, everything is an object and everything is passed by value, but these values are references to objects. In the case of objects and lists, if your method makes a change in a shopping cart array, for example, by adding an item to purchase, then any other method that uses that cart array will be affected by this addition. That may be great, however it can be bad too. Let's imagine a bad situation:
The user clicks the "Purchase", button which calls a purchase method that spawns a network request and sends the cart array to the server. Because of a bad network connection, the purchase method has to keep retrying the request. Now, what if in the meantime the user accidentally clicks "Add to Cart" button on an item they don't actually want before the network request begins? If that happens and the network request begins, then that purchase method will send the accidentally added item because it has a reference to a shopping cart array that the add_item_to_cart method modified by adding an unwanted item.
A great solution would be for the add_item_to_cart to always clone the cart, edit it, and return the clone. This ensures that no other methods that are holding onto a reference of the shopping cart will be affected by any changes.
Two caveats to mention to this approach:
There might be cases where you actually want to modify the input object, but when you adopt this programming practice you will find that those cases are pretty rare. Most things can be refactored to have no side effects!
Python comes built in the with a copy module which allows for copying and deep copying objects.
Cloning big objects can be expensive in terms of performance. So if you find that you're doing it often, you may want to consider implementing the
__copy__and__deepcopy__methods for improved performance. More info here
Bad:
fromdatetimeimportdatetimedefadd_item_to_cart(cart: list, item): cart.append({"item": item, "time": datetime.now()})Good:
fromdatetimeimportdatetimedefadd_item_to_cart(cart, item): cart_copy=cart[:] cart_copy.append({"item": item, "time": datetime.now()}) returncart_copyComing soon
Coming soon
Coming soon