Here’s Why You’re Struggling to Hire a Data Engineeradmin
Reason #1: Your compensation is below market value
This was by far the most common rebuttal to the supposed shortage of data engineering talent. “If you are really serious about a shortage, you should be really serious about making offers that can be competitive,” wrote jnordwick, “but I keep seeing the same $150k offers. That isn’t a ‘shortage’ kind of offer.” This experience was echoed by several others, including whenwillitstop: “[I’m] pinged by companies obsessively for my big data skills, all trying to pay me less than I am currently making.”
This feedback aligns with what we found from Indeed’s salary ranges, most data engineer openings don’t exceed $130k. The delightfully named SmellTheGlove, currently working as a Director of Data Engineering, added, “I build teams and make the data move and land it clean so your PhDs can do the smaaht stuff with it. I can stack BI and Analytics on top…” He said it would take 200k+ to get him to leave his current role, more for a job in San Francisco. Most companies are not currently willing to pay that rate.
Reason #2: You don’t understand the value data engineers will deliver to your organization
Go one level deeper, and you find that the reason companies are unwilling to pay the market rate for data engineering talent is pretty simple: they don’t understand the value data engineers deliver.
In both the Hacker News thread as well as our conversations with people working in this space, examples of companies willing to pay very well for this talent came up repeatedly. Netflix and Facebook (the 2nd largest employer of data engineers) are anecdotally known to pay data engineers north of $500k.
But outside of Netflix, Google, Facebook, and Wall Street data engineers are reporting a high level of sensitivity to anything north of $200k. Hacker News fell into two camps about why this is:
- Companies that think they need top-tier data engineering talent…but they don’t. The average tech company doesn’t have, as __derek__ put it: “finance/Google/Facebook level needs for data engineers.” and as a result, “They can’t reasonably claim to need top-level skills and then beggar out on the cost.” In other words, companies pay big money for data engineering talent because it delivers a ton of value to their business. If you’re unwilling to pay up for a big salary, it might be an indication you don’t actually need that level of talent. You might just be viewing a top-tier data engineer as “ornamentation” for your engineering team. achompas, a data scientist, said it like this: “Data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.” So before you go chasing the hottest big data talent out there, have a plan for how you will use it.
- Companies underestimating the value of just getting the basics right. In my original post (and the report itself), we used janitors and plumbers as analogs for data engineers. This was not meant to disparage data engineers, janitors, or plumbers, but after reading through the feedback, I see that this is a sensitive point. As kafkaesq pointed out: “It pretty much takes a SV alpha-nerd (or aspiring CEO seeking to cater to them) to come up with language like that.” Point taken. In retrospect, I can see why this labeling is so important to those doing the work of data engineering — most companies still view data engineering as grunt work, and salary levels at many companies reinforce that idea. Data engineering often ends up being forgotten, under-appreciated work that no one else wants to do. And software developers of all stripes have encountered this. Here’s one story from mrharrison (emphasis is mine):
I have been thrown these projects at work before, where I’m the front end engineer and I need to make some cool D3 visualization, but low and behold the data is shit, and I have to help the backend team make the data useable. It’s a mind-numbing job, that nobody wants, because it sounds like a one month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there is always 10 tricky edge cases that you have to work some magic on. Not only that buy you need to have smart people cleaning the data, so that you don’t make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it does on and on. It’s literally a mind-numbing job that most nobody wants. Data cleaning is a super golden problem to solve.
Here’s another really painful story from SmellTheGlove (emphasis is mine):
Once upon a time I managed (and, frankly, also wrote a lot of the code for) a project integrating half a dozen sources each managing a block of our business (billing, coverage, claims). The data was awful coming in and we managed to get a bunch of business processes changed in addition to some pretty heavy cleansing steps that we wrote. In any case, this big fragmented mess of monthly and weekly stacked data became my integrated, clean warehouse. For the first time ever at this organization, I had coverage and claims records tying up at a rate of 100% without any manual intervention. We did this so that we could implement a modern finance ops process on top (being intentionally vague) that would allow us to manage this block more efficiently, save time, and even let us better invest — it was a 2 year project including my data work. A handful of actuaries and analysts got promoted out of this as it was a BFD to the company. Yet, at the end of the year, when I got my review I got our equivalent of the average rating, 3 of 5, etc, and like a 3% raise, and a shitty budget for my people too. From then on, I spent almost as much time out there promoting our team’s work as we did doing the work. We did considerably better the next year, and that’s been the way I’ve operated ever since. I market the work.
Even for people who genuinely enjoy working on the challenges of data management, a lack of understanding about the importance of the fundamentals can zap the joy from it. Defending the joys of data management, dizzystar wrote:
Some people (me) really enjoy working with data, from cleaning, munging, creating, sorting, pipelining, etc, and find front-end visualization production excessively boring and mind-numbing…I enjoy writing a script that finds a bad piece of data, or a script that fixes up everything, or writing something that was once unable to run at all get converted to something that runs in 500ms.
The response from mrharrison:
I also think data is fun and don’t meant to be little [sic] the job, but in real world scenario it’s often detail intensive, under appreciated, tons of edge cases and extremely complex if you plan to make it scalable and fast…Customers will often complain at how long it takes and want more. It starts to wear away at one’s drive and passion for data. Its not the data aspect its the job/deadline aspect.
And back to the language people use, from rch:
I’ve heard more than one CTO/SR. Engineer refer to people in these roles as ‘data grunts’ or something similarly dismissive. Then they’re mystified as to why solid engineers are so quick to move up or out, year after year.
So, let me make this very clear–data engineering work is first and foremost engineering work. If you want to get on the data superhighway, these are the people building your roads and bridges. How’s that analogy? 🙂 There’s clearly a huge gap right now in executive understanding about this work — they want all the fun of “doing big data” with no real understanding about the importance of infrastructure. And as a result, many are unwilling to pay for what is often perceived as boring maintenance work.
Reason #3: You’re screwing up the hiring process
For anyone in the process of recruiting data engineers, this feedback is invaluable, ranging from the painful to just downright frustrating. Here is a story from ef5a0b0628 that should strike an empathy chord with anyone interviewing for a role in this field:
Every time something comes up on HN about a talent shortage in a field related to software engineering, it hurts. I have been unsuccessfully looking for a full time position since my last startup folded six months ago…It seems people in this industry refuse to understand that some people are not perfect…I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at white boarding answers to algorithm questions off the top of my head in a high pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.
This person’s experience mirrored another commenter who left their job when their spouse received an offer in Europe. ultramagas expected challenges related to looking for remote work, but the reality turned out to be much harsher:
But it was a summer of shitty timed hackerrank-style tests (virtual whiteboard hazing). I would tell my co-workers about them and they’d laugh in bewilderment at the questions that were asked in what should be a technical screener, and these are extremely smart and productive software guys that have started companies, written books, give conference talks…There’s definitely not a shortage of talent. It’s that every company thinks they need ‘A-players’, when the vast, vast majority are doing a damn basic CRUD app.
protomyth said that this seems to be a frustration that holds true across all technical roles: “I’m starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.” As pyb put it, “The system is optimized for the needs of HR people.”
A lack of understanding about what indicates a good match for a data engineering often ends up with candidates feeling like companies are “unicorn searching.” SmellTheGlove, who does most of their own hiring, weighed in again with this advice on hiring:
- Look for challenges faced and problems solved
- Pay less attention to tech used
- Learning a specific tech stack is easy
- Process and problem solving should be primary
So wait…is there a shortage?
Despite the comments referenced above, I do still think the data shows a talent shortage, and there are plenty of commenters who agree (I’ll cover their feedback in my next post). Regardless, the advice from software developers is spot on. If your company is struggling to hire data talent, a shortage of data engineering talent might not be the root of your problem.
Data engineering work is hard, complicated, and can be incredibly frustrating for anyone lacking a natural affinity for it. Take this advice to heart, and use it to inform how you go about adding data engineering talent to the team.