Simone

Generating maps with GenAI image tools

2026-03-252026-03-25 Simone Leave a comment

Did some testing of generating RPG maps with an AI. It’s surprisingly difficult. Thought I would share the outcome.

The prompt:

Generate a map for a tactical RPG with top-down ortographic projection – planimetric view, with high detail, of this place:
An underground soviet-era bunker, industrial horror style.
The planimetry includes:
1. A massive central workshop with piling rubble, heavy machinery, and a production of bolts and other materials.
2. A NKVD high-security conference room, with a single desk and monitor, near a small jail.
3. A small dormitory with bunk beds connected to a restroom which has an exposed eletrical panel.

My results, roughly listed from cheap to expensive model.

Qwen-image. Not orthogonal, and very very few of the requested details.

Qwen/Qwen-Image-2.0. Improved and offered good detail, but still doors are missing and it’s not orthogonal. Also, we have some really huge bolts here.

Juggernaut-Lightning-Flux. Not much detail, some of which look weird, some doors missing, and gravity looks off.

stabilityai/stable-diffusion-xl-base-1.0. Overall seems just confusing.

HiDream-ai/HiDream-I1-Fast. Interesting style, but not orthogonal.

stabilityai/stable-diffusion-3-medium. Every detail looks just confused; I can’t tell what is what.

RunDiffusion/Juggernaut-pro-flux. Gravity is off and there isn’t much detail.

HiDream-ai/HiDream-I1-Dev. Usable, but overall didn’t follow the prompt very closely.

HiDream-ai/HiDream-I1-Full. Usable but again it didn’t follow the prompt very closely.

Lykon/DreamShaper. Doesn’t look like it followed the prompt very much.

black-forest-labs/FLUX.2-dev. Looks actually pretty good and usable. The prompt was mostly followed. The amount of detail is ok. It could improve, but you can tell what most of the things are.

ByteDance-Seed/Seedream-3.0. Not bad, but not orthogonal and lacking doors.

ByteDance-Seed/Seedream-4.0. Still not orthogonal and lacking doors.

google/flash-image-2.5. Usable, although with some confused details.

google/imagen-4.0-fast. Somewhat usable but doors are missing and gravity is off.

black-forest-labs/FLUX.2-flex. Pretty good. Some doors missing. Not so orthogonal but usable.

black-forest-labs/FLUX.2-pro. Actually slightly worse than the previous. Rooms are a bit confused. Text is accurate, but it wasn’t asked for. Weird doors. Not so orthogonal. Still usable.

openai/gpt-image-1.5. High amount of detail. More rooms than asked. Some doors missing and some weird details, but overall good.

Wan-AI/Wan2.6-image. Somewhat usable but gravity is off and some details are confused.

Not tested here: more recent/advanced models like Nano Banana, yes they are better, but they are also more expensive.

As such my winner for this experiment is FLUX.2-dev: it followed the requirement accurately enough, and it’s still a very cheap model.

Flipping coins securely over the Internet

2025-01-102025-01-10 Simone Leave a comment

I have built Live Coin Flips after I’ve spent some time thinking how can two persons, in remote locations, flip a coin and agree on the result. It’s more challenging than what it looks like at first; flipping a coin while on a live video call is probably the easiest way, but it is not always feasible, and also any sleight-of-hand trick is much much easier to do remotely.

See for example these guys on YouTube predicting the coin flip outcome every time:

So I’ve come up with the idea of using a third-party, independent, unpredictable, universally agreed upon, continuous source of randomness. After some candidates I’ve singled out one: the Bitcoin price.

To make everything simpler, I’ve built a web application to calculate the outcome every minute. By deciding on a date in the future (it doesn’t need to be in the far future, just far enough for the Bitcoin price to not be published yet), you also commit on a certain coin flip outcome which will happen on that date.

You can see it in action here: Live Coin Flips

Let’s go back to static websites

2025-01-102025-01-10 Simone Leave a comment

In the era of continuously generated AI content, we can hardly describe the web as static. Yet, Amazon’s “static website” tool seems may be good enough for most.

I’ve collected a couple tools to create static websites on AWS S3, and manage the basic tasks (update SSL certificates, upload files and the like). Want to know why? Read on.

First, even by AWS price standards, static websites on S3 are dirt cheap. They scale with the amount of users, so for a small “experimental” website it might be pennies. This should be compared with the cost of a VPS that is in the 5$/month or so at the cheapest. If the traffic stays small, I have estimated that for around 100kB of data a page, the final cost for a million page loads would be around 10$. Egress fees is about the only cost here.

The other big advantage is scale. Serving a million page loads with a 5$/month VPS can be challenging. The few times by blog had thousands of visitors/hour, WordPress would struggle keeping up. For an S3 bucket however, that’s hardly a problem.

Third, why not; most of my blog viewers would perceive this website as “static” anyway (at least those that don’t leave comments… would you believe this was a thing? Leaving a comment in a website?). I have a feeling that anything that doesn’t need synchronous server-side logic can magically become “static”. Also platform maintenance is greatly simplified.

I’m probably not saying anything new here, but I always think about how much “the cloud” can scale up, without spending a minute thinking how much it can scale down. This is one of those occasions where it does. And actually, I wish it would happen more often.

Realistic, continuous hi-hat control module for Hydrogen (for edrums with CC pedal, with bonus cymbal choke)

2018-12-302018-12-30 Simone Leave a comment

I wrote this article back in 2012, to workaround some limitations in the Hydrogen drum machine. It was impossible to control the hi-hat from closed to open (and viceversa) seamlessly, so I wrote a MIDI router to fix it.

Here you can see it in action:

I wrote this script mostly in order to improve the Hydrogen hi-hat feeling, but I accidentally added a couple other features too.

It can seamlessly switch from closed to open hihat (and vice-versa), supports cymbals choke (both the ride, crash and via the hi-hat pedal), and slightly improves the dynamics range.

To use it, you will need:

an edrum (I used a Roland TD9)
several (at least 3) hi-hat samples from closed to open, i used those http://www.freesound.org/people/TicTacShutUp/packs/17/ plus some closed hihats from other sets
mididings (on Debian or Ubuntu, sudo apt-get install mididings)
this script: https://github.com/simonebaracchi/continuous-hh (if you have less or more than 7 hi-hat samples like I did, you’d better modify “hihats_pedal_range” in this script to make it match your number of samples)
in Hydrogen, setup your kit like this:
- first sample from the top: footclick (will be note 36)
- sample #2: completely closed hihat (will be note 37)
- sample #3: slightly less closed hihat (will be note 38)
- … and so on…
- then, all the other instruments (it is crucial that all the hihats are consecutive, and my script expects to find the first on note #37)
in Hydrogen, disable all mute groups (they don’t sound great for this purpose), disable “ignore note-off”, and make it listen to channel 9 if it’s not already so (channel 10 does not seem to be good for noteoffs, maybe it’s faulty in mididings, i don’t know)
in your edrum: use channel 9, set footclick to note #36, hihat to note #37 (both open and closed if you have both), and hihat pedal to CC#04 (it’s called “foot(4)” on mine)

launch script (“mididings -f <script>”), route your midi data into mididings, then from mididings to hydrogen (or another sequencer, maybe for recording, if you like), start banging on your edrum, you should be good to go.

There used to be a discussion about it on the Hydrogen forums which are now unavailable, but you can still read it thanks to the Internet Wayback Machine.

Can you drill a hole bigger than 1/4″ with a Dremel? Short answer: NO

2018-12-262018-12-26 Simone Leave a comment

At least, not in one single go.

I’ve been looking the web up and down for a 1/4″ chuck or 1/4″ drill bits that could fit the Dremel, and apparently there are a couple of models that advertise to work with it, such as this one:

This 1/4″ chuck is advertised to be compatible with Dremel. The brand name was covered since I don’t want to publicly criticize this specific product/vendor, but you can find plenty of them on online shops.

I had a little of skepticism as nobody seems to be using those.
I did the only logical(?) thing and bought one to test it out.

I hope I don’t have to buy into every scam to prove it’s a scam.

And, well, guess what, it can’t fit the Dremel, the threading is too big.
But at least I could write a blog post about it so you don’t have to buy it and test it yourself.
As far as my research goes, the biggest drill bit you can use with a Dremel is the “Brad point drill bit” (you can find more info on the official Dremel page) which goes up to 1/4″, but that’s it. In this case I was going to drill a hole for a 1/4″ audio jack, which needs to be slightly bigger than the jack itself (about 3/8″) so I guess I’ll have to use another drill.

Boogie Board Sync review

2018-12-132021-04-12 Simone Leave a comment

I was looking for a tool to replace all the paper clutter on my desk.

I’ve been looking high and low, and as of December 2018, to my knowledge these are the options if you want to hand-write your notes to an electronic device:

a iPad Pro or a Surface tablet, with an Apple Pen or the Surface Pen. They are Not Cheap (around ~1000€). I’ve only tried the Surface, and the input lag is noticeable.
a Samsung Android tablet with the S Pen, which, again, is Not Cheap (~700€).
another cheaper Android tablet with a regular capacitive pen, but the input lag is huge, and precision is poor.
a eInk-based tablet such as the Remarkable, which is Not Cheap (~500-600€) but seems to have very low lag (at least in the ads) and also doubles as an eReader and is capable of partial erasing; it is a very interesting concept, but the price tag led me to search more.
a Moleskine Smart Writing System, which uses actual pen and paper, except the paper is proprietary and while Somewhat Cheap (about 1€ for an A4) it is the only system that’ll require money for each page you use (wisely or less wisely).
tons of app&camera -based systems, some of which include microwave-erasable notebooks, such as the Rocket Book. These are cheaper (20-50€) but I didn’t like the idea of having to take a picture of the notebook, feeling in the end I’d just use it as, you know, a regular notebook.
tons of eWriter systems, which are Very Cheap (as low as 5€), with imperceptible input lag, but no “smart” features that allow exporting your sketches.

That’s where the Boogie Board Sync stood up. It is the only eWriter system with bluetooth capability and also has an affordable price tag (~60€). Due to the lack of reviews on the web, I’m writing my own.

Update 1: The Boogie Board Sync has been discontinued by the manufacturer, so I don’t recommend getting one.
Update 2: A couple more smart writing systems have surfaced, such as the Boogie Board Carbon Copy. I will talk again about those, along with other systems such as the Neo Smartpen, in another post.

The selling feature for me was the integration with Evernote. You can actually sketch something on the board, hit the save button, fire up the app on your mobile phone, let it sync, and have your sketch in Evernote moments after. They’ll all collect in a Evernote notebook of your choice for later reference. Unfortunately, Evernote is not able to recognize the hand-written text inside the sketches, but you can write some keywords (such as a timestamp and title) to make it easier to find it later.

The Boogie Board Sync app could definitely use some improvements. At first I tried to set it up on Windows to have it permanently available, but for some reason I couldn’t get the Evernote syncing to work on my laptop (tested on two Windows10 laptops). I tried to contact Boogie Board support but they never replied (boo!). This is really unprofessional on their part and brought me to the brink of returning the product, but in the end I figured out I could just install the app on my phone instead, which apparently works better. Also, I once had a problem where the app crashed, and upon restarting, it imported again every single sketch from the board, which had to be cleaned out manually; so even the app is not perfect.

The build quality for now seems ok. The board surface might look scratched from time to time, but it looks to be caused by the pen leaving some kind of small trail; if you clean up the screen with a cloth or a finger, you’ll have a perfectly smooth surface again.

The screen is still perfectly smooth after months of use.

On the other hand, it has shown to be vulnerable to hits. I’ve hit the screen probably by dropping my keychain on it, and now I have a couple of spots on the screen where it is much more sensible (for example, they light up if I softly push there with my finger).

A few hits to the screen led to some sensitive points.

The pen stroke is somewhat thicker similar to the one of a soft felt tip pen, so you’ll have to adapt to writing with larger letters. Also, the screen sensitivity seems to depend on the heat (such as heat from sunlight) and it will grow bigger if the device gets hotter. Overall the vector version of your sketches is good enough to read later, but it is also not failure-proof, as some strokes (sometimes whole letters) will be missing from the end result. If your work involves symbols and numbers rather than words (such as, maths) I guess this could be a bigger issue.

How a note looks on the screen (left), and exported as PDF (right). You can notice some letters are completely missing.

A note on the software: when you delete a sketch, it is actually still accessible by using a USB cable and mounting the board as a USB thumb drive. The sketches are actually stored as vector PDFs on the board. The internal memory is probably enough for ~40k sketches.

The board can also be used as a digital drawing pad, it can left-click the screen by tapping the pen on the board, and right-click by using the button on the pen. It does not have pressure-sensitivity. This was not a relevant use case to me, but it still could be useful to someone.

Overall I’m rather happy with my Boogie Board Sync. I’ve averaged one note per day since when I had it 3 months ago and now I couldn’t go back. It definitely could use some improvements, but the concept is very interesting and as such I hope some competitors make their move with some new models.

Tablets comparison

2018-02-082018-02-08 Simone Leave a comment

As I digged through the available 8-10″ tablets starting from 200 and above, I compiled this list of the main features I was interested in. There doesn’t seem to be a lot of models.

For some reason, very few of them, if any, can do phone calls (which is bad, because I really hoped to use it as a backup in case my phone goes haywire).
LTE or 4G option costs about 100€.
Prices are in euros.
Also, see my other post about why a tablet isn’t a computer.

Model	Price	Screen size	Screen type	Resolution	RAM	Notes
Amazon Fire HD 8	110	8	IPS	1280 x 800	1,5
Apple iPad Air 2 Wi-Fi + Cellular	519	9.7	Retina	1536 x 2048	2	Not available on Amazon
Apple iPad Mini 4	409	7.9	Retina	2048 x 1536	2	Not available on Amazon
Asus Z300M-6B050A ZenPad	168	10	IPS	1280 x 800	2
Asus Zenpad 3S	317	9.7	IPS	1536 x 2048	4
Asus ZenPad 3S 10 LTE	419	9.7	IPS	2048 x 1536	4	No phone
Asus ZenPad 3S Z500M-1J006A	425	9.7	IPS	2048 x 1536	4	No phone
Asus Zenpad S 8.0 Z580C	289	8	IPS	2048 x 1536	2	No phone
Huawei MediaPad M3	324	8.4	IPS	2,560×1,600	4	Frontal speakers
Lenovo TAB 2 A10-70L	259	10.1	IPS	1900 x 1200	2
Lenovo TB2-X30L	175	10.1	IPS	1280 x 800	2	No phone
Lenovo Yoga Tab 3 Plus ZA1R0020DE	407	10.1	IPS	2560 x 1600	3
Lenovo ZA090058SE Yoga Tab 3	224	8		1280 x 800	2
Nexus 7	165	7		1920 x 1200	2	No phone
Nvidia SHIELD 16GB Black	333	8		1920 x 1200	2
NVIDIA Shield K1	200	8		1920 x 1200	2	Discontinued – no phone
Samsung Galaxy S3	632	9.7	Super AMOLED	2048 x 1536	4
Samsung Galaxy Tab A T585N	267	10.1	TFT	1920 x 1200	2
Samsung Galaxy Tab A6	194	10.1	IPS	1920 x 1200	2	No phone
Samsung Galaxy Tab E SM-T561N	202	9.7	TFT	1280 x 800	1,5	No phone
Samsung Galaxy Tab S2 T713N	329	8	Super AMOLED	2048 x 1536	3
Samsung Galaxy Tab S2 9.7″ SM-T810NZWEDBT	343	9.7	Super AMOLED	2048 x 1536	3	4:3 display
Samsung SM-T585 Galaxy Tab A	254	10.1	TFT	1.920 x 1.200	2
Sony Xperia Z	380	10.1	TFT	1920 x 1200	2	Waterproof
Sony Xperia Z4 Tablet SGP712	407	10.1	IPS	2560 x 1600	3	Waterproof
DELL Venue 8 Pro	205	8	IPS	1280 x 800	1	Windows

Some statistics about Italian names and surnames

2017-12-312018-01-28 Simone Leave a comment

I’ve had the chance to run some statistics on a list of Italian names and surnames. I plan to feed this list to a machine learning algorithm and see what I can find out, but prior to that, I was curious to search a few metrics about Italian names.

Italian names make up 78% of the names of residents, and I have restricted my searches to people with one of the first 1200 most common Italian names. With those two constraints, I have filtered out about 98% of all unique names, which is a huge percentage of names, but a comparatively small number of people: just 8% were left out.

As you can see there is a rather big push towards the most popular names:

21% of Italians also have a middle name, and the popularity of middle names is even more dramatically skewed.

So what are the most common names in Italy? Apparently, Maria and Giuseppe are the most popular by a long stretch. It probably isn’t a coincidence that those two names are important in Christianity.

MARIA	1,91%
GIUSEPPE	1,88%
ANTONIO	1,37%
FRANCESCO	1,22%
GIOVANNI	1,18%
MARCO	1,05%
ROBERTO	1,04%
ANNA	1,01%
MARIO	0,92%
LUIGI	0,90%
PAOLO	0,85%
ALESSANDRO	0,82%
ANDREA	0,80%
FRANCESCA	0,70%
STEFANO	0,70%
PAOLA	0,70%
VINCENZO	0,67%
LAURA	0,66%

While here are some of the most popular last names:

BIANCHI	0,24%
ROSSI	0,22%
FERRARI	0,18%
COLOMBO	0,15%
BRUNO	0,15%
GIORDANO	0,15%
ESPOSITO	0,15%
GALLO	0,14%
RUSSO	0,14%
PROIETTI	0,14%

Some of the most common combinations. As you can see some somewhat less common last names, such as Caruso or Marino, appear very frequently in combination with some first names.

RUSSO GIUSEPPE

BRUNO GIUSEPPE

RUSSO ANTONIO

MARINO GIUSEPPE

ROSSI MARIA

ESPOSITO ANTONIO

GIORDANO GIUSEPPE

ESPOSITO GIUSEPPE

GALLO GIUSEPPE

GIORDANO MARIA

FERRARI MARIA

CARUSO GIUSEPPE

BIANCHI MARIA

ROMANO GIUSEPPE

BIANCHI ROBERTO

MARINO MARIA

ESPOSITO MARIA

GIORDANO FRANCESCO

BIANCHI MARCO

ROSSI GIUSEPPE

Having a look at the average surname length, in number of characters. There are some interesting outliers here. The graph is cut at 22 characters, I’m not able to tell how far it could go. The longest ones are combinations of multiple surnames, joined with hyphens. If we look at the non-joined ones, we can find the longest are Silettiformantello and Pasquadibisceglie.

Also first names have pretty long ones, such as Francescantonio, Mariantonietta, Giovanbattista, or Domenicantonio.

There does not seem to be a correlation or relationship between length of first and last name; the line fitting the scatter plot is practically flat (m=-0.014).

A peek at the less common names (that still had their way in the 1200 most popular):

LOLA

EGISTO

FILIPPINA

CLARISSA

LIDIO

ALFIA

SALVINA

GILDO

DESOLINA

RINALDA

MASSIMILIANA

DIOMIRA

DENNIS

ARTEMIO

NARCISA

LEONTINA

LUCE

OLINDA

CLEONICE

BENVENUTA

MASSIMINA

EULALIA

RODRIGO

CRESCENZA

MARLENE

VITTO

CRESCENZIA

I’ll soon try to find out if I can use a neural network to deduce the phonetic rules that bind first and last names. First names should be carefully chosen, and it makes sense to suppose that it should exist a “rule” to decide if a first-last name pairing “feels” right or not. But this is stuff for another post!

Disclaimer: the database I used for this article is around 3.5 million names, mainly from bigger cities (Roma, Milano, Torino) and other (mostly northern) towns.

In 2018, a tablet still isn’t a computer

2017-12-182018-12-30 Simone 1 Comment

What is a computer? That’s the title of one of the latests commercials about the iPad Pro.

In a nutshell, the designers show that the tablet has nothing to lose when compared to a computer.

I beg to differ. I was recently searching for a replacement for a laptop, and the tablet was an interesting candidate; but after research and first-hand testing, I’m confident to say a tablet is not a computer. They are two different products for two different use cases. Sure, a 1000€ Microsoft Surface can get close enough to a laptop, and also the iPad has some niceties that make it above average in many areas, but things get worse if we look at the much more widespread Android tablet, which is the worst offender here, even for devices with a 600€ price tag.

There are many things a Android tablet cannot do well or at all.

You cannot have two apps running at the same time reliably, as Android likes to kill them randomly. This is especially problematic if you’re using tools such as IRC or SSH which happen to use connection-oriented protocols.
Physical keyboards aren’t remappable without rooting. For example the Escape key closes all applications and there is no way to disable it. (90% of keyboard shortcuts do work though.)
Support for external peripherals is almost non existing. While all Android devices have a USB port, this can’t be used for USB devices such as USB-to-ethernet adapters or USB-to-HDMI. So, no wired networks and no external monitors are allowed. If you need those, you’ll have to do some research, as support is extremely spotty (Samsung and Apple are among the best here).
Drawing is impossible as the latency is huge, especially on bigger screens. The only possibility here is to get a tablet that has support for an active pen (that currently is available only on the top tier ones). Also the pen itself is, again, expensive.
Filesystem support is cumbersome. Try to download a file in a new directory and then move it somewhere else. Or try to forward an email with an attachment on the iPhone.
Gaming on a tablet is a sub-par experience, as mobile games are not at the same level as mediocre PC indie games.

On the other hand, I don’t mean tablets are useless; they are fit for different purposes. Even many laptops have disadvantages with respect to a tablet:

Few laptops support SIM cards.
Laptop cameras aren’t as advanced as mobile ones, and a laptop anyway can’t easily take a picture of what it’s behind it.
Few laptops are touch-enabled.
“Flip” or “Transformer” designs are rare. The keyboard is a burden if you just want to watch YouTube. This basically makes a laptop much less portable.
Laptops just aren’t socially fit sometimes, you’ll look weird if you bring one at the beach.
Many mobile apps are designed around portability: maps, touristic informations, public transports, and so on. Unfortunately many of them aren’t available for all platforms; if you’re used to a certain app on your phone, you might need something different on your laptop.
Mobile apps often offer background services which are less resource-intensive than keeping tens of Chrome tabs open.

So, there are pro and cons on both sides, as they’re basically two different products with a slight overlap in functionality. Tablets still excel at entertainment, but laptops are still the most versatile and productive.
You should understand your use case before deciding for the one or the other. I hope this list helps!

Spaghetti Distance Index: measuring set similarity for sentences with a bag-of-words model (too literally)

2017-06-202017-07-13 Simone Leave a comment

Today we talk about how you shouldn’t measure sentence similarity.

“Bag of words” is a simplified model of handling sentences in natural language processing, which disregards grammar, punctuation, and word order and manages sentences as sets of words.

“Set similarity” is a measure of the distance between two sets (two collections of arbitrary items). There are many approaches in literature; I recently came across the Jaccard Index, while other alternatives are Sorensen Similarity Index and Hamming distance.
Jaccard Index (or Jaccard Distance) seems to be commonly used in text summarization algorithms, which use it to measure the similarity between two sentences (using a sentence as a set of words).

However Jaccard Index has some sub-optimal quirks. Its score is based on the number of items shared between the sets; when applied to sentence comparison, there are words that are very common (e.g. stopwords), especially in modern social network environments (e.g. Instagram hashtags), often used across different contexts (e.g. #picoftheday); I quickly realized not all words should be treated equally when measuring sentence similarity, and it is not trivial to evaluate if words are important or not.

Its score is also weighted on the set cardinality (the total number of words in the sentence); however this favours short sentences, for example it is way easier to have two sentences share all of their words when there is only one word in them. Big sets that have a large percentage of their items in common should get higher matching scores than smaller sets that have a similar percentage of matching items, since this has a smaller probability to happen for large sets.

This is where I tried to formulate a different, more context-aware, non-normalized, language-agnostic measure of similarity between sentences. I called this algorithm Spaghetti Distance Index. Its idea is, given two sentences, to measure the log-likelihood of ending up with a certain number of shared words. In formulas:

$SDI(X,Y) = - \displaystyle\sum_{w \epsilon (X \cap Y)} log \left( |X| \ |Y| \ p(w) ^ 2 \right)$

$X$ and $Y$ are the two sets, and $|X|$ and $|Y|$ are their size. $p(w)$ is the probability of item $w$ (simply put, the frequency of word $w$ among all sentences). $|X| p(w)$ is the probability of $w$ appearing in set X, and similarly for Y. $|X| \ |Y| \ p(w) ^ 2$ is the probability an item $w$ being in both sets X and Y. The less frequent is the item, the greater the probability of a relationship between the sets.

This algorithm has some advantages over Jaccard Index:

the similarity score is unbounded, so longer sentences can have higher scores;
it favours high percentages of matching words, non-shared words have a negative impact on score;
uncommon words contribute more to the score;
it maintains language agnosticity, and can be used to compare any set of items.

This algorithm assumes each word as chosen indipendently from the others; this is literally what a “bag of word” would get you if words are chosen blindly and randomly. On the other hand, this assumption is certainly not true: words in a sentence have a correlation between them. This is especially true for sentences belonging to a certain topic, i.e. if sentences are about pasta recipes, and one word is spaghetti, then there would be a somewhat high probability of another word being tomato.
Factorizing in the correlation between words would bring us to the Mutual Information algorithm, which relies exactly on the probability of a certain word given all the other words.

But if the problem is comparing sentence similarity, then possibly there are other domain-specific similarity measures such as tf-idf (which accounts for both term frequency in sentences, and for general term frequency among all words), BM25, or sentence clustering algorithms such as Latent Dirichlet allocation.

You can see the source code for Spaghetti Distance Index on GitHub.