Data Science & Machine Learning 101

Share this post

3 Perspectives on How To Comment Your Code

bowtiedraptor.substack.com

Discover more from Data Science & Machine Learning 101

By Data Professionals, for Data Professionals. This is your centralized Website that has all of your data professional needs: We cover: - Money Making Guides - Job Searching - Technical Skills (R, Python, SQL, MLOps, etc...) - Industry Knowledge
Over 3,000 subscribers
Continue reading
Sign in

3 Perspectives on How To Comment Your Code

A Data Engineer - BowTiedCelt, A Software Engineer - BowTiedCrocodile, and a Machine Learning Engineer

BowTied_Raptor
May 22, 2023
9
Share this post

3 Perspectives on How To Comment Your Code

bowtiedraptor.substack.com
2
Share

This post got started by tcottz. I thought the dude was taking crazy pills, and he meant “good code doesn’t need comments” (this is silly), but after further clarification. His comment actually sparked an interesting conversation.

Since most of us use Python on a daily basis for different purposes. I thought it would be interesting to see how 3 different professions see the phrase “comment your code”. Here are the professions & contributors:

  • Data Engineer -

    BowTiedCelt

  • Software Engineer -

    BowTiedCrocodile

  • Machine Learning Engineer - Me

Note: You can read Celt’s raw un-edited articl here: here. And, you can view Crocodile’s article here: here.

Click here if you need a refresher on what comments are.

The 3 main topics we wanted to discuss are:

Table of Contents:

  1. The Purpose of Comments

  2. Good Comments vs Bad Comments

  3. Creating Good Comments

Data Science & Machine Learning 101 is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

1 - The Purpose of Comments

1.1 Data Engineer

Written By BowTiedCelt, he's a data engineer
Written By BowTiedCelt

As a Data Engineer, I find code comments to be an indispensable tool in my day-to-day work. They offer important context and explanations to the code I implement. This enhances the readability of the code, and fosters collaboration. The ability to:

  • Clarify The Purpose

  • Explain Complex Logic

  • Provide Guidance Within The Code

contributes to long-term code maintenance. More than a tool for myself, they help understand what's going on among my colleagues. This results in simplifying the code review process and marking future tasks, saving me from explaining my code

A common misconception I often encounter is that clean, well-written code eliminates the need for comments. While well-structured code may show "how" I've implemented something. The "why" often necessitates extra context. This context can involve:

  • Domain Knowledge

  • Complex Logic

  • Specific Constraints That Need Illumination

Of course, maintaining the relevance and accuracy of comments is a challenge.

I make it a point to keep my comments up-to-date with any code changes, and automate the process where possible.

I do recognize that striking a balance in the usage of comments is key. Too many comments can clutter the code. This balance is often informed by my own experience, my team's preferences, and the specific needs of the project. While AI tools like Google Bard can assist with commenting, they can't replace the nuanced understanding and decision-making that I as a developer bring. code, saving time, and aiding in overall software development processes.

1.2 Software Engineer

Written by BowTiedCrocodile, he's a software engineer
Written By BowTiedCrocodile

The role of comments in programming often sparks quite the debate. Some dismiss them , while others rely on them as the main form of documentation. I believe in a more balanced approach, advocating for high-quality, thoughtful commenting within the source code. If we strive to improve in this area, we can:

  • Increase Readability

  • Better Understand Context

  • Document More Effectively

  • Reduce Future Maintenance Headache

I've found comments to be a powerful tool for annotating my thoughts within the source code to provide more context. These annotations change depending on the programming language used. But, they always aim to enhance code readability.

From their early days of assisting programmers in navigating specific parts of the code. Comments have matured into tools that outline different sections of the code and give context to its functionality. This mirrors the current approach to documenting Public APIs, where comments (Java's Javadoc) supply indispensable context.

That said, we can't overlook the importance of comment quality when it comes to code readability. I spend a significant chunk of my time reading and understanding existing code, particularly when it involves older or complex applications. High-quality comments are essential in these situations to help understanding and prevent bugs. But, we must be cautious, as relying too much on comments can lead to:

  • Misinterpretations

  • Productivity Loss

  • Introduce Bugs

My aim is to write code that's so legible it minimizes the need for comments.

In my view, comments should not be used as a crutch for confusing code, but as a useful tool for comprehension.

1.3 Machine Learning Engineer

Written by BowTiedRaptor, he's a Machine Learning Engineer
Written By BowTiedRaptor

MLEs are responsible for using data to create predictions. These predictions can be used to generate some $$$ for the business. In other words, we are not paid by amazing coding skills. But, rather coding is just a mere tool. Comments shine in the realm of:

  • Data Retrieval

  • Mapping Functions

  • APIs

  • Data Pipelines

When interacting with a SQL server, the retrieval process can be complex. I'll place comments to ensure a clear understanding of how queries are constructed. More specifically, what each Query does, and why certain data is being retrieved. This will lead to more efficient debugging and easier enhancements in the future.

Comments, within data mapping functions add a layer of comprehension that the code alone doesn't provide. These areas need:

  • Complex Logic

  • Intricate Transformations

  • Data Manipulations

and expecting someone to just "get it" will not work. Consider this: yCharts, Bank of Canada, Fred, all provide US/CAD conversion rates. In the comments, we can state why we went with 1 specific data provider over others. If this data pipeline is passed off to someone else, they can read the comments and alter it according to their needs.

Detailed comments make the ML pipeline's flow more understandable, and explains the logic in each stage. This is crucial for data pipelines due to their complexity, and the high impact they often have on the results of Machine Learning models.

2 - Good Comments vs Bad Comments

1.1 Data Engineer

At one point, I found myself scrambling to remember the context of a complex function I had written. A problem that could have been circumvented with the right comments in the code. This experience underscores the four key roles of code comments:

  • To Clarify Purpose

  • Explain Complicated Logic

  • Describe Variables & Functions

  • Document Alterations

These kinds of comments boost readability, and also promote collaboration among developers. This is achieved by helping everyone understand the code, streamlining code reviews, and aiding in the mapping out of future enhancements.

Some examples on how to write solid comments in Python
Example of nice comments

Comments are amazing for onboarding new team members, and ensuring long-term ownership. Comments serve as important reminders of why certain implementations were chosen. They can also speed things up during product outages. A well-documented, and commented codebase can simplify the process, cutting down on time spent asking for clarifications. Good comments can shape senior engineers' perceptions of your work, and impact performance reviews.

Despite the widespread notion that "clean code doesn't need comments,". I believe that comments are *important* for passing on:

  • Domain Knowledge

  • Clarifying Intricate Logic

  • Outlining Constraints

While I do think you can go over-board and clutter your code with comments. I think you should be going for a balance, a skill that comes with experience, and a thorough understanding of your team.

1.2 Software Engineer

I see comments as a double-edged sword. They can either aid in code readability & maintenance, or cause potential pitfalls. A common situation is when a developer annotates complex logic, but fails to update these annotations as the code evolves. This leads to outdated, or misleading comments. This can cause confusion, and misdirection for future developers. Such an instance underscores how comments can mislead, becoming stale due to system evolution.

We have to make sure the comments, and the code are synchronous. This means that any modifications we make to the code must be reflected within the comments as well. This ends up doubling our maintenance effort. This problem is exacerbated when managing code and comments written by other developers. While modern Integrated Development Environments (IDEs) can support mass code updates. They don't offer the same help for comments, creating more manual work.

Some more details on a Integrated Development Environment (IDE)
An Integrated Development Environment. It’s basically like Pycharm

Keep in mind that Insightful and well-positioned comments can improve code comprehension. The best comments are those that become obsolete after careful code refactoring. If a comment is necessary, it should provide valuable context for the decision-making process. And, it should be something that isn't obvious by reading the code itself. Comments are useful for explaining complex or obscure aspects, and design decisions that affect the code. But, the utility of comments hinges on keeping them up-to-date and aligned with the evolving code.

1.3 Machine Learning Engineer

Let's run some examples on what good comments vs bad comments look like. When retrieving data from SQL servers, a good comment is one that provides context to my queries. It explains why certain data is being pulled, and how it will be used in the downstream processes. Bad comments will generally be ambiguous, or outdated, or un-necessary explanations. This will result in future MLEs being confused when they use the script to pull data.

Now let's look at some good vs bad comments in the context of data mapping. Good comments shine by explaining the reasoning behind specific transformations or mappings. For example, they'll explain why certain features are scaled or encoded in a specific way. Bad comments could be verbose, redundant, or even completely absent in these areas. This could leave other MLEs and future me puzzled over whether the data was properly mapped or not...

When it comes to APIs, comments should explain the function of each endpoint. They should also explain the expected inputs, and outputs, and a few useful params (to save time).

Data pipelines are another area where the distinction between good and bad comments is important. Good comments will:

  • Outline The Flow of Data

  • Explain Each stage In The Pipeline

  • Provide Rationale For The Order of Operations

Bad comments would be misleading, or cryptic, not explaining enough about the stages in the ML pipeline.

3 - Creating Good Comments

1.1 Data Engineer

Code comments are an essential tool in my work. They offer context that assist me, and also my fellow colleagues in understanding the logic within the code. In data engineering, where complex transformations and algorithms are prevalent. Comments prove especially critical, as they can:

  • Help Explain The Purpose of a Code Section

  • Describe Complex Logic/Variables

  • Guide Through Different Code Sections

  • Keep Track of Changes

Yet, it's crucial to maintain a balance to prevent the code from being overwhelmed by comments.

A function in data engineering may behave in a strange way, due to a specific business rule or data source characteristics. By annotating this 'why', we provide invaluable context for future developers who may need to maintain or change the code. Concerns about comments becoming outdated can be mitigated by updating comments as code changes. This will keep comments succinct, and leveraging automation for documentation where workable.

While AI tools such as Google Bard have made strides in the field, human involvement remains 100% required. These tools may assist in producing comprehensive comments. But, they could also result in excessive commenting and clutter. Hence, it falls on us, the Data Engineers, to moderate and fine-tune the comments. Our responsibility is to ensure a balance between comprehensibility and readability.

In conclusion, code comments are integral to being a good Data Engineer, and are not going anywhere.

Also, the less time we spend explaining our code to others, the more time we have to focus on what truly matters - 'escaping Shawshank.'

1.2 Software Engineer

Writing effective comments is influenced by:

  • individual preferences

  • language conventions

  • team or organizational guidelines

  • unique requirements of the domain

My single-line comments should be concise, free from redundancy, and placed in relation to the corresponding code. They should avoid being overly decorative, explaining things that are self-evident or trivial. Multiline comments need to stay compact, and I should avoid adding empty ones for future use. I also comment things like APIs, and I split complicated code into smaller understandable portions

Instead of commenting out unused code, I delete it. I ensure that my comments don't demand upkeep from external software and that they align with the code they explain. Documenting public APIs is vital too. Microsoft's guidelines suggest documenting all publicly visible types and their members. Though optional, private members can also be documented using XML comments. At the very least, types and their members should have a <summary> tag.

I always prioritize code readability when writing comments, making the code more approachable for everyone. It's important to remember that comments are a tool designed to improve code understanding and should be used judiciously.

1.3 Machine Learning Engineer

When data retrievals, it's important to clarify the intent of each query with a comment. I'd recommend documenting why:

  • Certain data is selected

  • How it will contribute to the overall pipeline

A tip is to make sure the comments stay up-to-date with each modification of the SQL statements.

When writing data mapping functions, you should leave clear comments explaining why specific data transformations are done. If a particular mapping is chosen, explain the rationale behind it. If a certain outlier detection method is used, describe the reasoning and its implications. Make sure these comments are co-located with the associated code to help understand. A strategy here is to remember that good comments should be informative but succinct.

Good Comments Should Guide, Not Disrupt.

For APIs, well-crafted comments can enhance understanding and ease of use. For each API endpoint, you should make it a practice to describe its purpose, and expected inputs and outputs. Another thing to do is to make sprint tickets on jira that talk about updating the documentation. Good comments need consistency, and some level of maintenance to ensure their relevancy.

Data Science & Machine Learning 101 is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

9
Share this post

3 Perspectives on How To Comment Your Code

bowtiedraptor.substack.com
2
Share
Previous
Next
2 Comments
Share this discussion

3 Perspectives on How To Comment Your Code

bowtiedraptor.substack.com
BowTiedCelt
Writes Software Architecture with BowT…
May 22Liked by BowTied_Raptor

great read for anyone in software engineering

Expand full comment
Reply
Share
1 reply by BowTied_Raptor
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 BowTied_Raptor
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing