I’ve had the chance to run some statistics on a list of Italian names and surnames. I plan to feed this list to a machine learning algorithm and see what I can find out, but prior to that, I was curious to search a few metrics about Italian names.
Italian names make up 78% of the names of residents, and I have restricted my searches to people with one of the first 1200 most common Italian names. With those two constraints, I have filtered out about 98% of all unique names, which is a huge percentage of names, but a comparatively small number of people: just 8% were left out.
As you can see there is a rather big push towards the most popular names:
21% of Italians also have a middle name, and the popularity of middle names is even more dramatically skewed.
So what are the most common names in Italy? Apparently, Maria and Giuseppe are the most popular by a long stretch. It probably isn’t a coincidence that those two names are important in Christianity.
While here are some of the most popular last names:
Some of the most common combinations. As you can see some somewhat less common last names, such as Caruso or Marino, appear very frequently in combination with some first names.
Having a look at the average surname length, in number of characters. There are some interesting outliers here. The graph is cut at 22 characters, I’m not able to tell how far it could go. The longest ones are combinations of multiple surnames, joined with hyphens. If we look at the non-joined ones, we can find the longest are Silettiformantello and Pasquadibisceglie.
Also first names have pretty long ones, such as Francescantonio, Mariantonietta, Giovanbattista, or Domenicantonio.
There does not seem to be a correlation or relationship between length of first and last name; the line fitting the scatter plot is practically flat (m=-0.014).
A peek at the less common names (that still had their way in the 1200 most popular):
I’ll soon try to find out if I can use a neural network to deduce the phonetic rules that bind first and last names. First names should be carefully chosen, and it makes sense to suppose that it should exist a “rule” to decide if a first-last name pairing “feels” right or not. But this is stuff for another post!
Disclaimer: the database I used for this article is around 3.5 million names, mainly from bigger cities (Roma, Milano, Torino) and other (mostly northern) towns.