LLM's and Employment?

Aug 21, 2025

Welcome to the age of LLMs. Half the population thinks LLMs think just because they implemented a <Loader/> after every prompt that says “thinking...”. I’m not going to lie— it does give the appearance of thinking.

Thinking - ideas or opinions about something.

AI Development = Speed + Thoughtful decisions? In my opinion using AI in my development life cycle did drastically improve at least one of the two while often contradicting each other. What I mean by this is, I often found myself being fast with it or being thoughtful with it and not both. So the balance was missing.

Have you ever had a super junior intern working under you, and no matter how much you explained, they’d just not understand? It’s not because you cannot explain it well enough, nor because they are dumb enough to not understand. It’s just that their brain has not yet been exposed to the problem at hand — and it is way above their current understanding of software. That is the state of LLMs these days. In order to make them understand, people started using RAG, fine-tuning, reinforcement learning, and so on, to improve responses. But with all this being implemented and discussed — at the end of the day, it is predicting, not thinking. The reason LLM's appear to be thinking is beacuse they are trained on some high level english grammar and have enough context to appear to have emotions. Below you can see how it predicts tokens depending on this controversial paragraph.

when-i-asked-it-to-be-contrversial

When I asked ChatGPT to fix the above paragraph and asked it to keep it controversial, it started with "Got it 😏"!

With all that being said, how is the employment scenario looking? Recently OpenAI launched its ₹399 / month plan in India and why is it so big? Let's do a hypothetical calculation which might be wrong but just to get an idea.

Definitions :

Parameters, precision, KV cache, FLOPs, latency, energy
  • Weights: GPU memory holding model parameters.
  • KV cache: per-token attention state stored during decoding.
  • FLOPs: total floating-point operations (2 × params × tokens used here).
  • Latency: estimated request processing time on the cluster.
  • Energy: power × time (reported in Wh and kJ).

Selected configuration :

Model 4.0 × 1012 params (4T)
Weights precision INT8 (1 byte / weight)
KV cache precision FP16 (2 bytes / element)
Tokens (prompt / gen) 4,000 in / 500 out → 4,500 total
Hardware H100 SXM 80 GB (assume 80 GB usable)
Throughput assumption 400 TFLOPS / GPU sustained
Power / GPU 600 W
Sharding Assume ideal linear sharding across GPUs

Calculations :

Weights:

4,000,000,000,000 weights × 1 byte = 4,000 GB (≈4.0 TB)

KV Cache:

Assuming ~80 layers, 128 heads, 128 head_dim for 4T model

KV = 2 × batch × seq_len × layers × heads × head_dim × precision

KV = 2 × 1 × 4,500 × 80 × 128 × 128 × 2 bytes ≈ 11.8 GB

Overhead ≈ 15% of (weights + KV) ≈ 602 GB

Total memory ≈ 4,614 GB (≈4.61 TB)

Min H100s (80GB each) = ceil(4,614 / 80) = 58 GPUs

Total FLOPs = 2 × params × tokens = 36,000 TFLOPs

Cluster TFLOPS = 400 × 58 = 23,200 TFLOPs

Latency ≈ 36,000 / 23,200 ≈ 1.55 s

Energy = Power × Time = (600 × 58) W × 1.55 s = 53,736 J ≈ 14.9 Wh

Cost @ $0.12/kWh in USA ≈ $0.0018 / request

With all these calculations, according to reports India produces 1.5 Million engineers every year. If 50% of them subcribe to the mentioned plan from OpenAI that would be 750000 * ₹399. But why does this matter?

Scale Calculations :

Daily Volume:
750,000 users × 100 requests/day = 75,000,000 requests/day
Daily Energy Consumption:
75,000,000 requests × 0.015 kWh = 1,125,000 kWh = 1.125 GWh/day
Daily Electricity Cost:
75,000,000 requests × $0.0018 = $135,000/day
Processing Time Requirements:
75,000,000 × 1.5 seconds = 112.5 million seconds = 31,250 hours of compute time
With 60 GPU cluster: 31,250 ÷ 60 = 520.8 hours of continuous operation needed
That's 21.7 days of continuous operation to process one day's requests!

On average, a US citizen consumes about 10,791 kWh annually (2022). My instinct is that AI will create an artificial demand for energy that pushes up the natural baseline of consumption. In the short run, this could drive per-unit energy prices higher, since production needs to catch up, much like the early days of cloud computing. But over time, while total consumption will keep rising, the cost per task will fall as efficiency improves and economies of scale kick in. Just as cloud created entirely new roles, AI will also open up fresh opportunities, even as it pressures existing ones. The real challenge is to stay in sync with this progress — making sure AI remains a friend, not a foe, supporting us without disrupting the very jobs it was meant to empower.

NOTE : The above calculations are purely demonstrative and can be wrong in many ways, It is just to showcase that employment of currently laid off engineers is inevitable given the skills are upto the mark.

Art - From a different perspecitve