~/dsa/group-anagrams

Group Anagrams

Medium·
  • Arrays
  • Hash Table
  • Strings
NeetCode 150

the trick

Two words are anagrams exactly when their letter counts match, so a 26-length count tuple is a canonical key. Group by that key in one pass.

The idea

I'm given a list of words and I want to bucket together the ones that are rearrangements of each other. The whole problem is finding a key that's identical for every anagram and different for everything else. Once I have that key, a hash map does the grouping for free.

Brute force, then the better key

The obvious key is the sorted word: "eat" and "tea" both sort to "aet". That works, but sorting each word costs O(k log k), so the total is O(n * k log k).

I can do better. Two words are anagrams exactly when they have the same count of each letter. With lowercase input that's a fixed 26-slot tally, and a tuple of those 26 counts is a perfectly good dict key. Building one tally is O(k), so the whole thing drops to O(n * k) and I never sort anything.

from collections import defaultdict
 
 
def group_anagrams(strs: list[str]) -> list[list[str]]:
    groups: dict[tuple[int, ...], list[str]] = defaultdict(list)
 
    for word in strs:
        counts = [0] * 26
        for char in word:
            counts[ord(char) - ord("a")] += 1
        groups[tuple(counts)].append(word)
 
    return list(groups.values())

Why it's correct

The key is a function of letter counts alone, so word order inside a word is ignored. That's exactly the anagram relation. Equal counts collide into the same bucket; any difference in a single letter changes the tuple and splits them apart. The empty string maps to the all-zero tuple and groups with other empty strings, which is right.

Each word costs one O(k) pass to tally, giving O(n * k) time. Storage holds every word once plus a fixed-size key per group, so O(n * k) space.