Tablets comparison

As I digged through the available 8-10″ tablets starting from 200 and above, I compiled this list of the main features I was interested in. There doesn’t seem to be a lot of models.

For some reason, very few of them, if any, can do phone calls (which is bad, because I really hoped to use it as a backup in case my phone goes haywire).
LTE or 4G option costs about 100€.
Prices are in euros.
Also, see my other post about why a tablet isn’t a computer.

Model Price Screen size Screen type Resolution RAM Notes
Amazon Fire HD 8 110 8 IPS 1280 x 800 1,5
Apple iPad Air 2 Wi-Fi + Cellular 519 9.7 Retina 1536 x 2048 2
Not available on Amazon
Apple iPad Mini 4 409 7.9 Retina 2048 x 1536 2 Not available on Amazon
Asus Z300M-6B050A ZenPad 168 10 IPS 1280 x 800 2
Asus Zenpad 3S 317 9.7 IPS 1536 x 2048 4
Asus ZenPad 3S 10 LTE 419 9.7 IPS 2048 x 1536 4 No phone
Asus ZenPad 3S Z500M-1J006A 425 9.7 IPS 2048 x 1536 4 No phone
Asus Zenpad S 8.0 Z580C 289 8 IPS 2048 x 1536 2 No phone
Huawei MediaPad M3 324 8.4 IPS 2,560×1,600 4 Frontal speakers
Lenovo TAB 2 A10-70L 259 10.1 IPS 1900 x 1200 2
Lenovo TB2-X30L 175 10.1 IPS 1280 x 800 2 No phone
Lenovo Yoga Tab 3 Plus ZA1R0020DE 407 10.1 IPS 2560 x 1600 3
Lenovo ZA090058SE Yoga Tab 3 224 8 1280 x 800 2
Nexus 7 165 7 1920 x 1200 2 No phone
Nvidia SHIELD 16GB Black 333 8 1920 x 1200 2
NVIDIA Shield K1 200 8 1920 x 1200 2
Discontinued – no phone
Samsung Galaxy S3 632 9.7 Super AMOLED 2048 x 1536 4
Samsung Galaxy Tab A T585N 267 10.1 TFT 1920 x 1200 2
Samsung Galaxy Tab A6 194 10.1 IPS 1920 x 1200 2 No phone
Samsung Galaxy Tab E SM-T561N 202 9.7 TFT 1280 x 800 1,5 No phone
Samsung Galaxy Tab S2 T713N 329 8 Super AMOLED 2048 x 1536 3
Samsung Galaxy Tab S2 9.7″ SM-T810NZWEDBT 343 9.7 Super AMOLED 2048 x 1536 3 4:3 display
Samsung SM-T585 Galaxy Tab A 254 10.1 TFT 1.920 x 1.200 2
Sony Xperia Z 380 10.1 TFT 1920 x 1200 2 Waterproof
Sony Xperia Z4 Tablet SGP712 407 10.1 IPS 2560 x 1600 3 Waterproof
DELL Venue 8 Pro 205 8 IPS 1280 x 800 1 Windows

 

Some statistics about Italian names and surnames

I’ve had the chance to run some statistics on a list of Italian names and surnames. I plan to feed this list to a machine learning algorithm and see what I can find out, but prior to that, I was curious to search a few metrics about Italian names.

Italian names make up 78% of the names of residents, and I have restricted my searches to people with one of the first 1200 most common Italian names. With those two constraints, I have filtered out about 98% of all unique names, which is a huge percentage of names, but a comparatively small number of people: just 8% were left out.

As you can see there is a rather big push towards the most popular names:

 

21% of Italians also have a middle name, and the popularity of middle names is even more dramatically skewed.

So what are the most common names in Italy? Apparently, Maria and Giuseppe are the most popular by a long stretch. It probably isn’t a coincidence that those two names are important in Christianity.

MARIA 1,91%
GIUSEPPE 1,88%
ANTONIO 1,37%
FRANCESCO 1,22%
GIOVANNI 1,18%
MARCO 1,05%
ROBERTO 1,04%
ANNA 1,01%
MARIO 0,92%
LUIGI 0,90%
PAOLO 0,85%
ALESSANDRO 0,82%
ANDREA 0,80%
FRANCESCA 0,70%
STEFANO 0,70%
PAOLA 0,70%
VINCENZO 0,67%
LAURA 0,66%

While here are some of the most popular last names:

BIANCHI 0,24%
ROSSI 0,22%
FERRARI 0,18%
COLOMBO 0,15%
BRUNO 0,15%
GIORDANO 0,15%
ESPOSITO 0,15%
GALLO 0,14%
RUSSO 0,14%
PROIETTI 0,14%

Some of the most common combinations. As you can see some somewhat less common last names, such as Caruso or Marino, appear very frequently in combination with some first names.

RUSSO GIUSEPPE
BRUNO GIUSEPPE
RUSSO ANTONIO
MARINO GIUSEPPE
ROSSI MARIA
ESPOSITO ANTONIO
GIORDANO GIUSEPPE
ESPOSITO GIUSEPPE
GALLO GIUSEPPE
GIORDANO MARIA
FERRARI MARIA
CARUSO GIUSEPPE
BIANCHI MARIA
ROMANO GIUSEPPE
BIANCHI ROBERTO
MARINO MARIA
ESPOSITO MARIA
GIORDANO FRANCESCO
BIANCHI MARCO
ROSSI GIUSEPPE

Having a look at the average surname length, in number of characters. There are some interesting outliers here. The graph is cut at 22 characters, I’m not able to tell how far it could go. The longest ones are combinations of multiple surnames, joined with hyphens. If we look at the non-joined ones, we can find the longest are Silettiformantello and Pasquadibisceglie.

Also first names have pretty long ones, such as Francescantonio, Mariantonietta, Giovanbattista, or Domenicantonio.

There does not seem to be a correlation or relationship between length of first and last name; the line fitting the scatter plot is practically flat (m=-0.014).

A peek at the less common names (that still had their way in the 1200 most popular):

LOLA
EGISTO
FILIPPINA
CLARISSA
LIDIO
ALFIA
SALVINA
GILDO
DESOLINA
RINALDA
MASSIMILIANA
DIOMIRA
DENNIS
ARTEMIO
NARCISA
LEONTINA
LUCE
OLINDA
CLEONICE
BENVENUTA
MASSIMINA
EULALIA
RODRIGO
CRESCENZA
MARLENE
VITTO
CRESCENZIA

I’ll soon try to find out if I can use a neural network to deduce the phonetic rules that bind first and last names. First names should be carefully chosen, and it makes sense to suppose that it should exist a “rule” to decide if a first-last name pairing “feels” right or not. But this is stuff for another post!

Disclaimer: the database I used for this article is around 3.5 million names, mainly from bigger cities (Roma, Milano, Torino) and other (mostly northern) towns.

In 2018, a tablet still isn’t a computer

What is a computer? That’s the title of one of the latests commercials about the iPad Pro.

In a nutshell, the designers show that the tablet has nothing to lose when compared to a computer.

I beg to differ. I was recently searching for a replacement for a laptop, and the tablet was an interesting candidate; but after research and first-hand testing, I’m confident to say a tablet is not a computer. They are two different products for two different use cases. Sure, a 1000€ Microsoft Surface can get close enough to a laptop, and also the iPad has some niceties that make it above average in many areas, but things get worse if we look at the much more widespread Android tablet, which is the worst offender here, even for devices with a 600€ price tag.

There are many things a Android tablet cannot do well or at all.

  • You cannot have two apps running at the same time reliably, as Android likes to kill them randomly. This is especially problematic if you’re using tools such as IRC or SSH which happen to use connection-oriented protocols.
  • Physical keyboards aren’t remappable without rooting. For example the Escape key closes all applications and there is no way to disable it. (90% of keyboard shortcuts do work though.)
  • Support for external peripherals is almost non existing. While all Android devices have a USB port, this can’t be used for USB devices such as USB-to-ethernet adapters or USB-to-HDMI. So, no wired networks and no external monitors are allowed. If you need those, you’ll have to do some research, as support is extremely spotty (Samsung and Apple are among the best here).
  • Drawing is impossible as the latency is huge, especially on bigger screens. The only possibility here is to get a tablet that has support for an active pen (that currently is available only on the top tier ones). Also the pen itself is, again, expensive.
  • Filesystem support is cumbersome. Try to download a file in a new directory and then move it somewhere else. Or try to forward an email with an attachment on the iPhone.
  • Gaming on a tablet is a sub-par experience, as mobile games are not at the same level as mediocre PC indie games.

On the other hand, I don’t mean tablets are useless; they are fit for different purposes. Even many laptops have disadvantages with respect to a tablet:

  • Few laptops support SIM cards.
  • Laptop cameras aren’t as advanced as mobile ones, and a laptop anyway can’t easily take a picture of what it’s behind it.
  • Few laptops are touch-enabled.
  • “Flip” or “Transformer” designs are rare. The keyboard is a burden if you just want to watch YouTube. This basically makes a laptop much less portable.
  • Laptops just aren’t socially fit sometimes, you’ll look weird if you bring one at the beach.
  • Many mobile apps are designed around portability: maps, touristic informations, public transports, and so on. Unfortunately many of them aren’t available for all platforms; if you’re used to a certain app on your phone, you might need something different on your laptop.
  • Mobile apps often offer background services which are less resource-intensive than keeping tens of Chrome tabs open.

So, there are pro and cons on both sides, as they’re basically two different products with a slight overlap in functionality. Tablets still excel at entertainment, but laptops are still the most versatile and productive.
You should understand your use case before deciding for the one or the other. I hope this list helps!

 

Spaghetti Distance Index: measuring set similarity for sentences with a bag-of-words model (too literally)

Today we talk about how you shouldn’t measure sentence similarity.

“Bag of words” is a simplified model of handling sentences in natural language processing, which disregards grammar, punctuation, and word order and manages sentences as sets of words.

“Set similarity” is a measure of the distance between two sets (two collections of arbitrary items). There are many approaches in literature; I recently came across the Jaccard Index, while other alternatives are Sorensen Similarity Index and Hamming distance.
Jaccard Index (or Jaccard Distance) seems to be commonly used in text summarization algorithms, which use it to measure the similarity between two sentences (using a sentence as a set of words).

However Jaccard Index has some sub-optimal quirks. Its score is based on the number of items shared between the sets; when applied to sentence comparison, there are words that are very common (e.g. stopwords), especially in modern social network environments (e.g. Instagram hashtags), often used across different contexts (e.g. #picoftheday); I quickly realized not all words should be treated equally when measuring sentence similarity, and it is not trivial to evaluate if words are important or not.

Its score is also weighted on the set cardinality (the total number of words in the sentence); however this favours short sentences, for example it is way easier to have two sentences share all of their words when there is only one word in them. Big sets that have a large percentage of their items in common should get higher matching scores than smaller sets that have a similar percentage of matching items, since this has a smaller probability to happen for large sets.

This is where I tried to formulate a different, more context-aware, non-normalized, language-agnostic measure of similarity between sentences. I called this algorithm Spaghetti Distance Index. Its idea is, given two sentences, to measure the log-likelihood of ending up with a certain number of shared words. In formulas:

SDI(X,Y) = - \displaystyle\sum_{w \epsilon (X \cap Y)} log \left( |X| \ |Y| \ p(w) ^ 2 \right)

X and Y are the two sets, and |X| and |Y| are their size. p(w) is the probability of item w (simply put, the frequency of word w among all sentences). |X| p(w) is the probability of w appearing in set X, and similarly for Y. |X| \ |Y| \ p(w) ^ 2 is the probability an item w being in both sets X and Y. The less frequent is the item, the greater the probability of a relationship between the sets.

This algorithm has some advantages over Jaccard Index:

  • the similarity score is unbounded, so longer sentences can have higher scores;
  • it favours high percentages of matching words, non-shared words have a negative impact on score;
  • uncommon words contribute more to the score;
  • it maintains language agnosticity, and can be used to compare any set of items.

This algorithm assumes each word as chosen indipendently from the others; this is literally what a “bag of word” would get you if words are chosen blindly and randomly. On the other hand, this assumption is certainly not true: words in a sentence have a correlation between them. This is especially true for sentences belonging to a certain topic, i.e. if sentences are about pasta recipes, and one word is spaghetti, then there would be a somewhat high probability of another word being tomato.
Factorizing in the correlation between words would bring us to the Mutual Information algorithm, which relies exactly on the probability of a certain word given all the other words.

But if the problem is comparing sentence similarity, then possibly there are other domain-specific similarity measures such as tf-idf (which accounts for both term frequency in sentences, and for general term frequency among all words), BM25, or sentence clustering algorithms such as Latent Dirichlet allocation.

You can see the source code for Spaghetti Distance Index on GitHub.

POSIX Signals are not just bad, they’re awful

It all started with a problem which seem to be simple and already seen around in the real world:

I just need a simple script to restart a process if it crashes.

Simple enough right? So simple that I wanted to use a short bash script to do it. Turns out it’s not really that simple, especially if your thought process continues with

And I will use signals to control it.

Okay, now it’s really a bad idea which can fail for a lot of reasons:

 1. Checking if a process has crashed is not idiot-proof

The idiot being the one that writes a C program that exits main() with “return(-1);” or any other negative value. The exit status is a signed byte, thus a negative value will wrap around to a > 127 value and be indistinguishable from a process crash.

A program exiting with “return(-119);” has the same exit status as one killed with kill -9 (SIGKILL). The fun!

 2. You can’t always control signals in bash

According to POSIX specs, bash can’t change the signal handler for signals which were ignored in the parent process. So this script would be unusable if you happen to launch it from a process which has the relevant signals ignored. That’s where I got the idea to use a one-line python wrapper around my bash script to enable all signals before giving control to bash.

 3. Using a signal wrapper spawns a second process

It turned out quickly that it wasn’t a great idea. Using a python wrapper around a bash script will obviously generate a second process, one for bash and one for python.
So in just one move I spawned three more problems:

– the python wrapper has the process name you would expect (the wrapper name) and has every signal enabled, i.e. any signal would kill it
– the bash script has a different process name which makes it counter-intuitive which one is the process to send signals to
– the bash script starts after the python wrapper so if two wrappers start simultaneously, we have a funny race condition to deal with.

So, I just decided to drop bash and rewrite everything in python instead.

 4. There is no standard signal to ask a process to restart

Usually it’s SIGHUP, but it’s not universally true. If your controlled process can be restarted with a signal, that signal should be SIGHUP, but no guarantees.

 5. Signals are not setup immediately at startup

When your control script starts, all signals have their default handler. So for example if you launch your control script and then immediately decide to restart the controlled process (with a SIGHUP signal), it may happen that the controller gets killed instead, and the child process is left with no control.

 6. SIGKILL can’t be handled

If your control script is killed with SIGKILL, the child process is left running with no control.

 7. The child process exiting will have interesting race conditions

Say for example that the restart behaviour will be to send SIGHUP to the child process, wait for its exit, and launch the child process again.
If your control script is asked to restart the child process several times in a row, it might be sending several signals to a PID, which after the first SIGHUP might not correspond to any running process (not so bad) or correspond to a different process recently spawned (definitely not good).

In certain *NIX flavours you can setup a signal handler that will fire when the child process exits, but based on my research this is not true for every OS out there.

 8. There is no alternative to signals

I might have finished this list with better news, but unfortunately the only way to ask a process to terminate is, yes, signals.

The DOs:

 Use a lockfile to ensure a single instance of your controller is running.

 Use process groups.

 Handle SIGCHLD if your OS uses it to signal child processes exiting.

 Use systemd if your OS has it. It might get kind of long to config properly (especially for a dynamic list of processes), though.

PHP not parsing POST data?

If your PHP (or any other server side code) is receiving a single $_POST field of raw data, looking more or less like this, instead of receiving your form fields nicely:

——WebKitFormBoundaryVmWPyJJIBR3n18Wk
Content-Disposition: form-data; name=”form-file-1″; filename=”my document.doc”
Content-Type: application/octet-stream
——WebKitFormBoundaryVmWPyJJIBR3n18Wk
Content-Disposition: form-data; name=”form-field-1″

form-value-1
——WebKitFormBoundaryVmWPyJJIBR3n18Wk–

(that WebKitFormBoundary is probably the sign that you are using Chrome)

Then, chances are your javascript is overriding the content-type header. Check out the request headers, and if you find anything looking like this:

Content-Type:application/x-www-form-urlencoded;

or really anything else which is not this:

Content-Type:multipart/form-data; boundary=—-WebKitFormBoundaryVmWPyJJIBR3n18Wk

then you need to fix the content-type header. It might be simpler than you think; try setting it to `false` (in jQuery / Prototype / …) or disable anything javascript is setting there. The browser will take care of it better than what JS is doing.

In my case, I was getting this problem because I was using the js FormData object to extract form data and then sending it the same way it used to do with “regular” form data. Turns out the content-type header should be set differently. Maybe I saved you a couple of hours of searching.

How to not drive a motor with an Arduino

If you are like me, when you were an absolute beginner, you tried to wire a motor up to some batteries, saw the motor was spinning, and thought out loud: “It’s working! How hard could it be to turn it on and off with an Arduino?”

Actually a little more than it seemed at first. There are several problems I encountered and slowly fixed, and I’ll try to sum up in this post the kind of iterative reasoning that led to a decent circuit design.

motor1

So, first thing you try is to attach the battery poles to the motor, keeping the wires in place with your fingers. Everything seems to work fine! So what’s next? What’s the most obvious attempt to control it?

motor2

The second step would be to add a switch. Pushing the switch makes it run, release the switch and it stops. Perfect: now you just have to use an Arduino to open and close that switch.
motor3
This might be the most obvious change for a beginner, but also a very very bad idea. All the current is passing through the Arduino which is, in best cases, limited at 40mA, while even a very small motor will probably need way more than that. This might work for very very small motors but overall you risk damaging the pins.

motor4

So a “better” (but still bad) idea is to use a transistor instead! The configuration above is called emitter-follower. After all, transistor are like switches, and you have a lot of 2n2222 lying around, and they are a buck a kilogram, so why not make use of them? And maybe this circuit is kind of working, but it has at least two issues you should be considering now:

  • What is the current required by the motor? Check the datasheet, and remember it probably needs a way higher amount when it is starting from a still position. The 2n2222 has a maximum rating of 500mA which might not be enough even if your motor is rated to consume only, say, 100mA. A cheap option would be to use a Darlington pair (basically, more transistors), but a better choice would be to use a different transistor which can bear a bigger current, like a MOSFET (e.g. a IRF510, still cheap and easily available) or possibly a relay. (As you probably know there are several models of BJTs, MOSFETs and relays and I’m pretty sure someone skilled enough could write a book about proper choice of this part).
  • the other issue is that a part of the current is still sourced through the Arduino, so before burning something for good, a nice idea would be to add a resistor for protection, so that the maximum current flowing out of the Arduino is limited. (This is probably not an issue if using a MOSFET).

motor5

This will add some protection to the Arduino, but really, why risk? We can change the circuit so that none of the current flowing from the Arduino actually goes to the motor.

motor6

This configuration, called common emitter, is getting better, but there’s still something needed to be done. The motor is a typical inductive load, and when the transistor tries to open the circuit, the motor will try to keep going, forcing a current in the transistor (which is bad and will likely damage it) and/or in the Arduino (which will cause random resets and damage the pins). Another improvement is to add a flyback diode, so that the current has somewhere to go when the circuit is open.

motor7

There. It probably isn’t finished here, but it’s getting on the safe side. Then you might want to drive the motor in reverse, vary its speed, but this will probably be discussed in another post.

(Credits: graphics in this page were drawn with Fritzing.)

My experience with Soylent alternatives in Europe so far

Unfortunately in Europe there are no Soylent resellers. There are, however, quite a bunch of less famous alternatives. I’ve been asked about them a couple of times so I felt it was worth a blog entry.

I’ve been trying Soylent alternatives since November 2015, replacing about 30-40% of my meals. I had the chance to try Jimmi Joy (formerly Joylent), Huel, Mana, Nano, Jake and Futricio (formerly SoylentLife) and here are some of the differences between them.

Jimmi Joy: it is the one I’ve been recommending to everyone wanting to try out Soylent alternatives. Jimmi Joy is probably the one with the less barriers to entry. It is one of the cheapest, it comes with a handy bottle, it leaves a good fullness sensation and has a good variety of the best flavors. Cons: it tastes really better if you either blend it or leave it to rest for 8 hours or more, and variety is limited to its 5/6 flavors as there is no “unflavoured” option. Jimmi Joy also offers a “sports” version, and their bars are legit.

Mana: I’m more cautious in recommending it to beginners, since I feel it needs a little more experience to appreciate. Pros: very good ingredients (almost completely natural, and all of them are laboratory tested), has a good fullness sensation, and is almost unflavored, meaning you can add whatever flavor you want; this gives Mana the best flavor variety, in a way. Cons: it’s one of the most expensive, their bottle is not up to par with the others, and preparing it needs some more dedication.

Huel: I find this an excellent product, and the mix of different flours (oats, pea, and flaxseed) is very interesting. Pros: free shirt, good bottle, 30-30-40 macros ratio, cheap, comes in unflavoured variety, best “fullness” sensation yet, can be used for baking (not tried). Cons: does not dissolve easily (a spoon helps a lot), it is REALLY flavourless, needs a lot of flavours added (which they luckily sell along it). It is a great product if you want to stick to the unflavoured variety (and add some flavors yourself, but again, it’s dedication for such a product), otherwise the vanilla variety gets old after a while.

Jake: I have been trying their Sports variety because of the added protein contents, but after all it didn’t make me feel like I really needed them. Pros: good bottle included, single portion bags are very easy to prepare, and good vanilla flavor. Cons: only vanilla is available, and it didn’t make me feel really “full”. After a while I was craving for something different. Overall I think I could recommend it for casual use, but not for every day.

Nano: it is very similar to Jake (they share vanilla, pea flour, and crushed flaxseed as ingredients). Pros: less flavoured than Jake, you can actually add other flavours for variety, better “full” sensation than Jake, single portion bags. Cons: the included bottle is hard to clean, and it never makes me feel really “full”, just like Jake.

Nano Veggie: it is a unique product as it’s meant to be consumed hot. It has a quite interesting spicy tomato soup consistency. Pros: unique taste, single portion bags. Cons: only one flavour, and the same downsides as regular Nano.

Futricio: I was not satisfied with this. Pros: bottle included, single portion bags, 5 flavors available. Cons: flavors are not as good as Joylent ones, and it didn’t really make me feel “full”.

There are a couple of others I did not try out: Queal, Bertrand, and so on. Other reviews have not encouraged me to do so, but if somebody wants to trade bags, I’m open.

I also compiled a list of ingredients easily found in Italy, and attempted to make a recipe out of those (with an simple optimization tool I wrote), but still haven’t had a chance to try it out.