3 Perspectives on How To Comment Your Code
A Data Engineer - BowTiedCelt, A Software Engineer - BowTiedCrocodile, and a Machine Learning Engineer
This post got started by tcottz. I thought the dude was taking crazy pills, and he meant “good code doesn’t need comments” (this is silly), but after further clarification. His comment actually sparked an interesting conversation.
Since most of us use Python on a daily basis for different purposes. I thought it would be interesting to see how 3 different professions see the phrase “comment your code”. Here are the professions & contributors:
Data Engineer -
Software Engineer -
Machine Learning Engineer - Me
Note: You can read Celt’s raw un-edited articl here: here. And, you can view Crocodile’s article here: here.
Click here if you need a refresher on what comments are.
The 3 main topics we wanted to discuss are:
Table of Contents:
The Purpose of Comments
Good Comments vs Bad Comments
Creating Good Comments
1 - The Purpose of Comments
1.1 Data Engineer
As a Data Engineer, I find code comments to be an indispensable tool in my day-to-day work. They offer important context and explanations to the code I implement. This enhances the readability of the code, and fosters collaboration. The ability to:
Clarify The Purpose
Explain Complex Logic
Provide Guidance Within The Code
contributes to long-term code maintenance. More than a tool for myself, they help understand what's going on among my colleagues. This results in simplifying the code review process and marking future tasks, saving me from explaining my code
A common misconception I often encounter is that clean, well-written code eliminates the need for comments. While well-structured code may show "how" I've implemented something. The "why" often necessitates extra context. This context can involve:
Domain Knowledge
Complex Logic
Specific Constraints That Need Illumination
Of course, maintaining the relevance and accuracy of comments is a challenge.
I make it a point to keep my comments up-to-date with any code changes, and automate the process where possible.
I do recognize that striking a balance in the usage of comments is key. Too many comments can clutter the code. This balance is often informed by my own experience, my team's preferences, and the specific needs of the project. While AI tools like Google Bard can assist with commenting, they can't replace the nuanced understanding and decision-making that I as a developer bring. code, saving time, and aiding in overall software development processes.
1.2 Software Engineer
The role of comments in programming often sparks quite the debate. Some dismiss them , while others rely on them as the main form of documentation. I believe in a more balanced approach, advocating for high-quality, thoughtful commenting within the source code. If we strive to improve in this area, we can:
Increase Readability
Better Understand Context
Document More Effectively
Reduce Future Maintenance Headache
I've found comments to be a powerful tool for annotating my thoughts within the source code to provide more context. These annotations change depending on the programming language used. But, they always aim to enhance code readability.
From their early days of assisting programmers in navigating specific parts of the code. Comments have matured into tools that outline different sections of the code and give context to its functionality. This mirrors the current approach to documenting Public APIs, where comments (Java's Javadoc) supply indispensable context.
That said, we can't overlook the importance of comment quality when it comes to code readability. I spend a significant chunk of my time reading and understanding existing code, particularly when it involves older or complex applications. High-quality comments are essential in these situations to help understanding and prevent bugs. But, we must be cautious, as relying too much on comments can lead to:
Misinterpretations
Productivity Loss
Introduce Bugs
My aim is to write code that's so legible it minimizes the need for comments.
In my view, comments should not be used as a crutch for confusing code, but as a useful tool for comprehension.
1.3 Machine Learning Engineer
MLEs are responsible for using data to create predictions. These predictions can be used to generate some $$$ for the business. In other words, we are not paid by amazing coding skills. But, rather coding is just a mere tool. Comments shine in the realm of:
Data Retrieval
Mapping Functions
APIs
Data Pipelines
When interacting with a SQL server, the retrieval process can be complex. I'll place comments to ensure a clear understanding of how queries are constructed. More specifically, what each Query does, and why certain data is being retrieved. This will lead to more efficient debugging and easier enhancements in the future.
Comments, within data mapping functions add a layer of comprehension that the code alone doesn't provide. These areas need:
Complex Logic
Intricate Transformations
Data Manipulations
and expecting someone to just "get it" will not work. Consider this: yCharts, Bank of Canada, Fred, all provide US/CAD conversion rates. In the comments, we can state why we went with 1 specific data provider over others. If this data pipeline is passed off to someone else, they can read the comments and alter it according to their needs.
Detailed comments make the ML pipeline's flow more understandable, and explains the logic in each stage. This is crucial for data pipelines due to their complexity, and the high impact they often have on the results of Machine Learning models.
2 - Good Comments vs Bad Comments
1.1 Data Engineer
At one point, I found myself scrambling to remember the context of a complex function I had written. A problem that could have been circumvented with the right comments in the code. This experience underscores the four key roles of code comments:
To Clarify Purpose
Explain Complicated Logic
Describe Variables & Functions
Document Alterations
These kinds of comments boost readability, and also promote collaboration among developers. This is achieved by helping everyone understand the code, streamlining code reviews, and aiding in the mapping out of future enhancements.
Comments are amazing for onboarding new team members, and ensuring long-term ownership. Comments serve as important reminders of why certain implementations were chosen. They can also speed things up during product outages. A well-documented, and commented codebase can simplify the process, cutting down on time spent asking for clarifications. Good comments can shape senior engineers' perceptions of your work, and impact performance reviews.
Despite the widespread notion that "clean code doesn't need comments,". I believe that comments are *important* for passing on:
Domain Knowledge
Clarifying Intricate Logic
Outlining Constraints
While I do think you can go over-board and clutter your code with comments. I think you should be going for a balance, a skill that comes with experience, and a thorough understanding of your team.
1.2 Software Engineer
I see comments as a double-edged sword. They can either aid in code readability & maintenance, or cause potential pitfalls. A common situation is when a developer annotates complex logic, but fails to update these annotations as the code evolves. This leads to outdated, or misleading comments. This can cause confusion, and misdirection for future developers. Such an instance underscores how comments can mislead, becoming stale due to system evolution.
We have to make sure the comments, and the code are synchronous. This means that any modifications we make to the code must be reflected within the comments as well. This ends up doubling our maintenance effort. This problem is exacerbated when managing code and comments written by other developers. While modern Integrated Development Environments (IDEs) can support mass code updates. They don't offer the same help for comments, creating more manual work.
Keep in mind that Insightful and well-positioned comments can improve code comprehension. The best comments are those that become obsolete after careful code refactoring. If a comment is necessary, it should provide valuable context for the decision-making process. And, it should be something that isn't obvious by reading the code itself. Comments are useful for explaining complex or obscure aspects, and design decisions that affect the code. But, the utility of comments hinges on keeping them up-to-date and aligned with the evolving code.
1.3 Machine Learning Engineer
Let's run some examples on what good comments vs bad comments look like. When retrieving data from SQL servers, a good comment is one that provides context to my queries. It explains why certain data is being pulled, and how it will be used in the downstream processes. Bad comments will generally be ambiguous, or outdated, or un-necessary explanations. This will result in future MLEs being confused when they use the script to pull data.
Now let's look at some good vs bad comments in the context of data mapping. Good comments shine by explaining the reasoning behind specific transformations or mappings. For example, they'll explain why certain features are scaled or encoded in a specific way. Bad comments could be verbose, redundant, or even completely absent in these areas. This could leave other MLEs and future me puzzled over whether the data was properly mapped or not...
When it comes to APIs, comments should explain the function of each endpoint. They should also explain the expected inputs, and outputs, and a few useful params (to save time).
Data pipelines are another area where the distinction between good and bad comments is important. Good comments will:
Outline The Flow of Data
Explain Each stage In The Pipeline
Provide Rationale For The Order of Operations
Bad comments would be misleading, or cryptic, not explaining enough about the stages in the ML pipeline.
3 - Creating Good Comments
1.1 Data Engineer
Code comments are an essential tool in my work. They offer context that assist me, and also my fellow colleagues in understanding the logic within the code. In data engineering, where complex transformations and algorithms are prevalent. Comments prove especially critical, as they can:
Help Explain The Purpose of a Code Section
Describe Complex Logic/Variables
Guide Through Different Code Sections
Keep Track of Changes
Yet, it's crucial to maintain a balance to prevent the code from being overwhelmed by comments.
A function in data engineering may behave in a strange way, due to a specific business rule or data source characteristics. By annotating this 'why', we provide invaluable context for future developers who may need to maintain or change the code. Concerns about comments becoming outdated can be mitigated by updating comments as code changes. This will keep comments succinct, and leveraging automation for documentation where workable.
While AI tools such as Google Bard have made strides in the field, human involvement remains 100% required. These tools may assist in producing comprehensive comments. But, they could also result in excessive commenting and clutter. Hence, it falls on us, the Data Engineers, to moderate and fine-tune the comments. Our responsibility is to ensure a balance between comprehensibility and readability.
In conclusion, code comments are integral to being a good Data Engineer, and are not going anywhere.
Also, the less time we spend explaining our code to others, the more time we have to focus on what truly matters - 'escaping Shawshank.'
1.2 Software Engineer
Writing effective comments is influenced by:
individual preferences
language conventions
team or organizational guidelines
unique requirements of the domain
My single-line comments should be concise, free from redundancy, and placed in relation to the corresponding code. They should avoid being overly decorative, explaining things that are self-evident or trivial. Multiline comments need to stay compact, and I should avoid adding empty ones for future use. I also comment things like APIs, and I split complicated code into smaller understandable portions
Instead of commenting out unused code, I delete it. I ensure that my comments don't demand upkeep from external software and that they align with the code they explain. Documenting public APIs is vital too. Microsoft's guidelines suggest documenting all publicly visible types and their members. Though optional, private members can also be documented using XML comments. At the very least, types and their members should have a <summary> tag.
I always prioritize code readability when writing comments, making the code more approachable for everyone. It's important to remember that comments are a tool designed to improve code understanding and should be used judiciously.
1.3 Machine Learning Engineer
When data retrievals, it's important to clarify the intent of each query with a comment. I'd recommend documenting why:
Certain data is selected
How it will contribute to the overall pipeline
A tip is to make sure the comments stay up-to-date with each modification of the SQL statements.
When writing data mapping functions, you should leave clear comments explaining why specific data transformations are done. If a particular mapping is chosen, explain the rationale behind it. If a certain outlier detection method is used, describe the reasoning and its implications. Make sure these comments are co-located with the associated code to help understand. A strategy here is to remember that good comments should be informative but succinct.
Good Comments Should Guide, Not Disrupt.
For APIs, well-crafted comments can enhance understanding and ease of use. For each API endpoint, you should make it a practice to describe its purpose, and expected inputs and outputs. Another thing to do is to make sprint tickets on jira that talk about updating the documentation. Good comments need consistency, and some level of maintenance to ensure their relevancy.
great read for anyone in software engineering