Similar Names¶
Here's a CSV file with 1,000 distinct U.S. baby names (all lowercase).
babynames_1000.csv
1: aaden
2: aaliyah
3: abby
4: abel
5: abigail
---
996: zander
997: zane
998: zara
999: zion
1000: zoe
How many distinct (A, B) pairs of names have Levenshtein distance ≤ 3?
Distinct entries
If your result includes (aaden, allen), make sure it doesn't also include (allen, aaden).
Loading the data¶
You can load the data directly from GitHub.
import pandas as pd
names = pd.read_csv("https://raw.githubusercontent.com/practiceprobs/datasets/main/babynames/babynames_1000.csv")
library(data.table)
names <- fread("https://raw.githubusercontent.com/practiceprobs/datasets/main/babynames/babynames_1000.csv")