Maximizing Your Google Skills as a Data Engineer: Tips and Tricks
Written on
Chapter 1: The Importance of Google for Data Engineers
In today's tech landscape, mastering the art of using search engines is crucial for data engineers. This article explores how new data engineers can utilize Google effectively to find precise solutions, streamline their development process, and troubleshoot issues more efficiently.
Currently on the job hunt? Enhance your prospects by working on a personal project using my complimentary 5-page project ideation guide.
When asked what one needs to know to become a data engineer, my cheeky response is simply: Google. And no, I'm not referring to Google Cloud Platform (GCP) but rather the search engine itself.
No matter how proficient you are in your chosen programming language, there will always be peculiarities or syntax you might not have encountered. Occasionally, even seasoned professionals can draw a blank on fundamental commands.
For instance, I may have had to search for whether to use ignore_index when employing the append method in Pandas.
Regardless of your search motivation, unless you are a dedicated Microsoft user or a data privacy advocate using DuckDuckGo, you likely turn to Google Search.
You might be thinking, "Is there really a right way to Google?" The answer is yes, and also, there is a wrong way.
One negative habit that Google has fostered in me and countless developers is the tendency to immediately search for answers without pausing to analyze the issue. Instead of taking a moment to diagnose, we often rush to find out if someone else has faced the same problem and, ideally, has a solution.
However, this copy-paste method can be quite deceptive. Copying error messages verbatim can lead to irrelevant results, particularly if they include environment-specific details. Conversely, a too-broad search might land you in a beginner's Python course rather than providing the specific assistance you need.
When experienced developers say they'll "Google that" or "look into it," they are subconsciously following a logical sequence of steps.
Note: I use "Google" as a general term for search engines and do not endorse Google Search or Google Cloud Platform.
First, we must analyze the error message to identify the type of error we are dealing with. For errors other than syntax errors (which Google cannot decipher), we typically use a search formula that includes the exception type plus a brief description of the error.
For example, a common error when loading data into BigQuery might be searched as:
"JSONDecodeError Line 1 Column 1."
The incorrect approach would be to search something lengthy and convoluted like:
"JSONDecode Error (my column) in my_uploaded_data.csv columns: [name, title, amount, dt_updated] 400 error when uploading to project.dataset.table."
Longer searches risk exceeding Google's character limits, and since error messages can be lengthy, being concise is essential. SEO practices recommend keeping search queries under 60 characters for optimal results.
When working with numerous dependencies, I often check PyPi (The Python Package Index) or GitHub for any apparent, documented changes before diving into a Google search.
Once we have a list of results, we must quickly assess their credibility. I usually filter out:
- Anything labeled as a "sponsored post"
- Solutions that promote products
- Overly broad tutorials
- Irrelevant documentation
What remains is a selection of credible sources: high-ranking domains like StackOverflow, GitHub, Medium, and occasionally Reddit technical threads, as well as documentation that offers context even if it doesn't directly answer the query.
For specific products like Looker, it may be beneficial to search within community sites, as niche use cases are less likely to be addressed on platforms focused on general programming queries.
When evaluating results that include code, I focus on three important aspects:
- The age of the post
- Code versioning (Python 2 code is useless for a Python 3 developer)
- The author's credibility
During this assessment process, it's crucial to avoid accepting any solution without testing it first. The most effective way to utilize Google as a novice developer is to use it as a starting point for research rather than a transactional tool.
Creating a dedicated folder for specific errors or use cases can also be beneficial. For more complex questions, combining Google with pen and paper to take notes on findings, attempts, and failures can be invaluable, particularly when escalating issues to more experienced colleagues.
The worst way to use Google is to take it too literally. I've seen many peers and students copy entire error strings without pausing to comprehend their meanings or implications. This leads to comments like, "This is all I found," or returning with overly specific results from a sentence-length query.
Such inquiries often reveal a lack of depth in understanding the error, as evidenced by responses like, "Where did you find that?" indicating insufficient thought was applied to the search process.
Before committing to verifying the information returned, consider these questions:
- Does this result address the core of my issue, or just a part of it? If it's the latter, is there enough detail to warrant a follow-up search?
- Which approaches align best with my tech stack? For instance, developing a function might be more efficient than a long series of if/elif statements.
- What is the fundamental reasoning behind this approach? If you copy code from a Google search, you should be ready to justify its usage to your reviewer.
- How might this answer shape my future development? The goal is to learn from your queries, not just to apply solutions mindlessly.
Despite the criticism search engines receive for prioritizing optimized content over genuine solutions, I've generally found exploring Google to be quite productive. During my graduate studies, I often joked that my favorite professors were Google and YouTube, as they provided clarity when I struggled to grasp certain concepts.
However, like anyone, I've also faced the downsides of learning from search results, such as getting lost in tutorial hell. Misusing online resources can stem from inexperience or, worse, a lack of effort.
Like any tool, Google (or any search engine) should be employed wisely—it's a resource, not a crutch.
The first video titled "What Tools Should Data Engineers Know In 2024" discusses essential tools that aspiring data engineers need to be familiar with in the current landscape.
The second video titled "Will Data Engineering Exist In 5 Years - Is Data Engineering A Good Career Choice?" explores the future of data engineering and whether it remains a viable career path.