Nowadays, every single person on the planet who has an access to the internet can easily learn programming skills. Despite of its ease of access, the salary of software engineer is still comparatively higher than other jobs on the market. Of course, we know that one of the main reason is because of high market demand; no matter what kind of business you're trying to run, there's a high chance you need someone with IT skills. The other reason is, there's a huge difference between knowing how to code and writing a code for a job.
It's pondering time! If I were told to screen a candidate for a software engineering position and the candidate were my five years younger self, it probably wouldn't go well. Most likely, my younger self will oversimplify or overestimate the actual problem, but this is a perfectly normal behavior for beginners.
For example, let's say we have a 10 GB CSV dataset and you want to count the number of occurrences of each type of data. Now, let's take a look into several ways of thinking below:
Solution 1: My 7 years younger self will simply read the entire file once, load it into the memory, and start processing all lines with simple "for loop". My code will probably crash on the first run since there's not enough memory to store 10 GB of CSV data. This solution is very simple, fast to write, but it doesn't necessarily work.
Solution 2: My 5 years younger self will start learning fancy stuff, and I will probably implement reduce and gather mechanism (e.g.: Hadoop) to process the data. This solution is scalable, you can throw bigger files into it, but it will be much harder to write. If the first solution can be written in 10 minutes, this solution will probably take hours to prepare.
Solution 3: With more knowledge, I know that we can process 10 GB CSV data line by line without dumping the entire stuff into my RAM. But, before doing that, I will start researching whether there's a possibility for handling TB-sized CSV file in the near future (e.g.: 3 years). If I know the answer is a "no", the problem can be solved in 10 minutes: we simply read each lines one by one without dumping the entire file into memory. If I know the answer is a "yes", then it's worth to implement Solution 2 for better performance (faster speed, etc). If we will face TB-sized CSV file more than 3 years in the future, let's invest the time into doing something else since maintaining Hadoop cluster takes effort and money.
This is probably the hardest part of being a software engineer: we try to pursue an ideal world known to us at the present time, but we often unconsciously neglect the actual goal. When I was inexperienced, my ideal world was solution 1, where I didn't realize there're better solutions out there. The problem lies afterwards, when I started to know the existence of solution 2, I tried to solve all problems with it since the solution obviously offers scalability. Here is the real deal: Most of the times, increasing flexibility means longer time to implement and higher maintenance cost. The problem of managing the balance will become more trivial if you are working for a business where limited fund is provided and project deadline is imminent. "Finished, not perfect" is the key here, since the nature of software engineering world is continuous improvement. One of the challenge in software engineering is to find a balance between a lot of aspects; simply knowing how to code will not solve the problem.
The problem of simplicity and flexibility comes on multiple levels. For example, if a user has multiple emails, should we store user & user's emails in different SQL table? Or when should we choose a specific data structure over others?
At my workplace, we always have a lot of extremely skillful interns from all over the world. Undoubtedly, they don't have a problem in designing an architecture will multiple interconnected AWS Lambda functions or authentication & authorization with SAML, and so on. Everything looks nice and shiny, except most of the times, the solution is too complex and hard to maintain since no one can reproduce the same environment in the first place (no documentation). Or at the other time, it has a sparkling SAML wall but the cookie is stored in a plaintext without any signature so that everyone can alter the data easily.
The proper balance between simplicity and flexibility without losing focus of other things is not something that developers can learn overnight. Remember that an application can be flexible while it still maintains certain level of simplicity. Finding a proper balance is a continuous, never-ending process that grows with one's life experience.
Thanks for reading and see you next time!