“My language is better because it has a strong type system!” screams Dave, your colleague developer, trying to push the programming language Cobol for the next micro-service of your company.
Among developers, discussions about programming languages and their type systems can get quickly emotional. During these discussions, we often hear the words “type systems”, “data type”, “type inference”, “static typing”, “weak typing”, “coercion”, and more.
The goal of this article is to see the meaning of all these words with examples, for you to have good foundations and understand the type system of your favorite programming language. More precisely, we’ll see:
- What are types and why we need them.
- When type checking occurs.
- What are primitive types, composite types, and Abstract Data Types (ADT).
- Type declaration and changing types can be implicit or explicit.
- Types and functions.
- What is type strength.
- What is type safety.
I’ll use two different languages to illustrate the ideas, Golang and PHP. If you don’t know them, don’t worry! The examples are straightforward and easy to understand.
I encourage you to use some PHP interpreter online and the Go playground while reading, to play and experiment by yourself. This will help you understand the different concepts.
I won’t go into the gory details here. As you’ll see, it’s difficult to generalize something which is specific to a programming language. Yet, this article will give you a good overview of the usual properties of most type systems.
We’ll go progressively from clear waters to the muddy ideas, so take your rubber boots, get ready for the swamp, and let’s go!
What’s a Type in Software Development?
A type system is made of types. I know, it’s mind-blowing. These types are also called data types. What are they?
Syntax and Semantics
First, let’s clarify the difference between the syntax and the semantics of a programming language.
For example. the syntax of your mother tongue is the set of rules dictating the structure of the sentences. For many spoken languages, you’ll need a subject and a verb for your sentence to be correct. Programming languages have instead different constructs like expressions, control structures, or statements.
For example, for a if statement to be syntactically correct in PHP, you’ll need:
- A condition.
- Some parenthesis.
- Some curly brackets.
Something like that: if (1 == 1) { echo "I knew it!"; }
.
The semantics, on the other hand, is the meaning behind the constructs. For example, the semantics of an if statement can be explained as follows:
- The condition is executed.
- If the condition is true, the body of the statement is executed.
- If the condition is false, the body of the statement is not executed.
- The execution continues.
Why do we need syntax and semantics? It’s meant to communicate with two different kinds of entities:
- Your colleagues developers need to understand what the heck you did.
- The computer needs to “understand” the instructions to execute them.
Definition of a Type
To come back to our subject, a type can attach semantics to data. For example, in PHP:
<?php
$integer;
echo gettype(65) . "\n";
// => integer
$integer = 65;
echo gettype($integer) . "\n";
// => integer
In Mathematics, 65 is part of the set of integers, so PHP consider the value 65 as type integer. Your variable $integer
has the value 65 too, so it’s an integer, too.
When you assign 65 to the variable $integer
, you give it a semantics it didn’t have when we declared it, on the second line. This semantics will let you know what you can do with the variable.
We can conclude that:
- A type gives semantics to a piece of data.
- A type is a set of value. For the type integer, it will be a range of decimals. For the type string, it will be a range of possible strings.
Why Do We Need Types?
Representation and Semantics
When you declare a variable and assign it a value, the memory hold this value in binary. Our counting system is decimal, which means that the numbers we know and use everyday are very different in binary:
<?php
$integer = 65; // Decimal notation
printf("Binary notation of %d: %b \n", $integer, $integer);
// => Binary notation of 65: 1000001
Binary is just another way to represent numbers, and only numbers.
Dave, your colleague developer, is full of questions while reading these lines. “How does a character, such as ‘A’, is saved in memory?”, he wonders. “It’s not a number! It can’t be represented in binary!”.
Dave is right. The character has to be converted first into a decimal number, following the ASCII standards. Then, this ASCII code is saved in memory.
Now, let’s try this:
<?php
$integer = 65;
printf("Binary representation: %b \n", $integer);
// => Binary representation: 1000001
$character = 'A';
$ascii = ord('A'); // Ascii code of 'A'.
printf("Ascii code: %d \n", $ascii);
// => Ascii code: 65
printf("Binary representation: %b \n", $ascii);
// => Binary representation: 1000001
The integer 65 and the string ‘A’ have the same binary representation in memory!
Dave is confused. In despair, he asks the sky: “How the hell our program knows that $integer
is equal to 65, and $character
is equal to 'A
’? How?”. Indeed, in memory, the two values of $character
and $integer
are exactly the same: 1000001
. When we use these two variables in our code, the type system of our language will interpret the two values in memory, and it will decide what is a character and what is an integer.
This is important to understand, since this interpretation is not always accurate for some types, like floating point numbers.
A type will determine as well how you store a value in memory. For a character and an integer, we saw that they are stored the same way. The way to store floating point numbers, for example, is quite different.
The memory storage is nicely abstracted by the type system for us, developers, not to think about these confusing 0 and 1. You can then focus on more important problems, at least when the abstraction doesn’t leak. If you’re not sure what’s an abstraction, I wrote a detailed article about it.
A Set of Rules
A type system is as well a set of rule, more or less strict. You can’t do everything you want with some types.
Let’s take another example:
<?php
printf((3 + "Hello World!") . "\n");
// => PHP Warning: A non-numeric value encountered in /home/myusername/phpgoodies/test.php on line 3
// => 3
printf("The execution continue!");
// => The execution continue!
This code makes little sense. I try to reinvent Mathematics by adding the integer 3 to a string. PHP will throw a warning, but it will still give a result, 3
. When you violate the rules of a type system, the outcome can range between these two extremes:
- The interpreter or compiler will silently try to fix the problem and continue.
- The interpreter or compiler will throw an error and stop.
In the case of our example, PHP will throw a warning, but the execution will still continue. You can even get rid of the warning in the infamous php.ini
.
To compare with another language, let’s take the exact same thing in Golang:
package main
import (
"fmt"
)
func main() {
fmt.Println(3 + "Hello World")
// => invalid operation: 3 + "Hello World" (mismatched types untyped int and untyped string)
fmt.Println("The execution continue!")
}
The compiler will grant you with an error and your program won’t even be compiled.
Built-in Types vs Our Own Abstractions
Programming languages, more often than not, have a whole set of types you can use, out of the box. These types are called primitive types. For example: integer
, boolean
, float
, and more.
Often, you’ll be able to use as well composite types, a type containing multiple values, and possibly multiple primitive types. It’s what we call more commonly data structures.
For example, an array is a composite type:
<?php
$integerArray = [1, 2, 3];
$multipleTypeArray = [1, "two", 5.4];
The rules attached to these data types are imposed by the compiler (or the interpreter) of your language of choice.
Since the raise of the Abstract Data Types (ADT), you have the power, in high level programming languages, to create your own types. For example, in PHP:
<?php
class Shipment
{
public function send()
{
echo "Send powerful shipment!";
}
}
$shipment = new Shipment();
$shipment->send();
When you write $shipment = new Shipment()
, you create an instance of the class Shipment
. The object $shipment
can be considered as well of type Shipment
.
When you think about it, a type can be seen as a set of possible values, a category, or a group. A class has the same definition (see entry 3).
We can say as well that the object $shipment
is an abstraction of its class, and its interface (the way you interact with the abstraction) is the method send()
.
We created some rules for our new type:
- The only interface available is the method
send()
. - The method
send()
return a string, not an integer.
In Golang, you don’t have classes, but you can create custom types, too:
package main
import (
"fmt"
)
type minute int
func (m minute) second() int {
return int(m) * 60
}
func main() {
var min minute
min = 2
fmt.Printf("%d minutes are %d seconds", min, min.second())
// 2 minutes are 120 seconds
}
In that case, the new type minute
gets the same set of rules as the type int
. Then, you can attach methods to this new type minute
, like the method second()
.
Type Checking
If types push us to respect some rules, a programming language need an algorithm to check if we respect them. This is called type checking.
Even if type systems can be very similar or very different, depending on the programming language, we are humans, so we need to group these disparate things in categories to understand them.
There are two important categories of type checking: static type checking and dynamic type checking. They are mostly about when the type checking algorithms verify your code.
Dynamic Type Systems or Late Binding
For a dynamic type system, type checking will occur at runtime (when the computer execute your code). If your programming language doesn’t have any static type checking, it’s normally said that it’s a “dynamically typed language”.
If you don’t respect the rules of your data types, the code needs to be executed for the type checker to detect the mistakes and act accordingly (throwing a warning, stopping the execution, and whatnot).
For example, PHP is a dynamically typed language.
Static Type Systems
For a static type system, type checking occurs before runtime, during compilation. At that point, if the rules imposed by the compiler on your types are not respected, it’s likely that you’ll never compile your program.
It implies that the compiler needs to know the exact data type for each data in use in your program, before even running it. This can be a problem when the data type can only be determined at runtime. That’s why, more often than not, a statically typed language has some dynamic type checking as well!
Still, if types are mostly checked at compile time, we can say of a programming language that it’s “statically typed”.
Since most of the types don’t have to be checked at runtime, the performance of your program will be often better. The compiler can optimize your code, knowing that the rules of type system are respected.
For example, Golang is statically typed.
Changing Types
For now, everything looks nice and ordered in typing land. However, the world is not static and our codebases either. Things change, and often our types need to change as well.
Implicit Type Change
Let’s take another trivial example. You want to display some price on your fantastic e-commerce as follows:
<?php
echo "<p>Hey! Buy my fish! Now! Here's the price: " . 65 . " euros</p>";
You’ll notice that you mix here two types: integer (65) and string (everything in double quote). The type checker of your PHP interpreter understands that you want your integer 65
to be changed to a string, so it does it automatically, without asking you.
This is called coercion. The interpreter (or the compiler) don’t tell you explicitly that the change of type happens, but it happens anyway.
Let’s try the same in Golang:
package main
import (
"fmt"
)
func main() {
fmt.Println("<p>Hey! Buy my fish! Now! Here's the price: " + 65 + " euros</p>")
// => invalid operation: "<p>Hey! Buy my fish! Now! Here's the price: " + 65 (mismatched types untyped string and untyped int)
}
This time, the type checker will throw you an error at compile time. Your program will never compile, your e-commerce won’t sell any fish, say goodbye to fortune and glory.
Programming languages won’t always change implicitly types for you, and when they do, you need to be aware of the consequences. Of course, our examples are trivial (or you wouldn’t read this article anymore), but real codebases are more complex.
Explicit Type Change
On another hand, you can change types explicitly, which means that the type change will be clearly indicated in the code. It’s commonly called type casting.
To take back our fishy example from above, in PHP:
<?php
echo "<p>Hey! Buy my fish! Now! Here's the price: " . (string)65 . " euros</p>";
Here, we explicitly cast 65
from integer to string, using (string)
.
Can we do the same in Golang?
package main
import (
"fmt"
)
func main() {
fmt.Println("<p>Hey! Buy my fish! Now! Here's the price: " + string(65) + " euros</p>")
}
This time, it works! Golang is happy with type casting, but doesn’t support type coercion.
Type Declaration
As for type changes, type declarations can be implicit or explicit.
Implicit Type Declaration
In some languages like PHP, you can directly define the type of a variable by assigning a value:
<?php
$beautifulInt = 65;
$myString = "Hello";
$isItReallyTrue = true;
$aCompositeType = [];
echo gettype($beautifulInt) . "\n";
// => integer
echo gettype($myString) . "\n";
// => string
echo gettype($isItReallyTrue) . "\n";
// => boolean
echo gettype($aCompositeType) . "\n";
// => array
You don’t need to define the type of every variable when declaring them. The PHP interpreter will do that for you at runtime, depending on the value of the variables. This is called type inference.
You can do the same in Golang for primitive types:
package main
import (
"fmt"
"reflect"
)
func main() {
beautifulInt := 65
myString := "hello"
isItReallyTrue := true
fmt.Println(reflect.TypeOf(beautifulInt))
// => int
fmt.Println(reflect.TypeOf(myString))
// => string
fmt.Println(reflect.TypeOf(isItReallyTrue))
// => bool
}
Explicit Type Declaration
In some languages, you’ll need to explicitly type your variables when you declare them. In other, like PHP, you can’t (at least when I wrote these lines). Some programming language like Golang let you do both:
package main
import (
"fmt"
"reflect"
)
func main() {
beautifulInt := 65
var explicitBeautifulInt int = 65
fmt.Println(reflect.TypeOf(beautifulInt))
// => int
fmt.Println(reflect.TypeOf(explicitBeautifulInt))
// => int
}
For the variable explicitBeautifulInt
, a type int
is explicitly declared. The type becomes part of the syntax of the programming language itself.
Types and Functions
You can declare types for function arguments in some languages, as well as for function outputs.
How do you declare a function in PHP?
<?php
function coolFunction($coolThing) {
return "I think a " . $coolThing . " is a cool thing!";
}
echo coolFunction("Daffodil");
Again, no explicit type declaration for the inputs and output of the function. However, this time, you can add it if you want:
<?php
function coolFunction(string $coolThing): string {
return "I think a " . $coolThing . " is a cool thing!";
}
echo coolFunction("Daffodil");
Here, we explicitly declared $coolThing
and the return value to be strings. When declaring the type of function arguments is optional, we often call it type hinting.
What happens if you try to pass an integer to coolFunction
?
<?php
function coolFunction(string $coolThing): string {
return "I think a " . $coolThing . " is a cool thing!";
}
echo coolFunction(65);
// => I think a 65 is a cool thing!
The PHP interpreter will coerce your value from integer to string. If you return 65
instead of a string, it would have been coerced too.
In Golang, things are different:
- You have to declare the types of your inputs and output(s), or an error will be thrown.
- If the types are not respected, an error will be thrown.
No coercion whatsoever in Golang’s world.
package main
import (
"fmt"
)
func coolFunction(coolThing string) string {
return "I think a " + coolThing + " is a cool thing!"
}
func main() {
fmt.Println(coolFunction("Daffodil"))
// => I think a Daffodil is a cool thing!
}
If you give something else instead of a string to coolFunction
, you’ll get an error. If coolFunction
return anything but a string, you’ll get an error, too.
Type Strength and Type Safety
Type Strength
Some developers will categorize languages as “strong” or “weak”, based on different properties of a type system; if the type system coerce your values, for example. But it seems that nobody has the same definition of what should be “strong” or “weak”, which makes the communication difficult.
Many developers will think that it exists some mapping similar to the one below. You can replace the arrows ->
by “implies”:
- static language
->
strong - static language
->
no coercion - static language
->
always explicit typing - dynamic language
->
weak - dynamic language
->
do a lot of weird coercion - dynamic language
->
always implicit typing
I thought the same for too long. As we saw, Golang is a statically typed language but you don’t have to explicitly declare the types of your variables. Python is dynamically typed but don’t do much coercion.
In short, the mapping above is wrong.
I think healthy debates around programming languages are good. Deciding what’s the best tool for the problem at hand is essential, and we often teach and learn from these discussions. Developers are often passionate about their craft, and that’s definitely a good point!
However, speaking about “weak” or “strong” languages is confusing, because there is no clear definitions for these terms. Instead, we should specifically say what we like in a typing system, and why it’s better than another, using the vocabulary we saw above. It would add precision and clear meanings to the conversation.
Type Safety
When people speak about “weakly typed” and “strongly typed” languages, they often refer to the type safety of a language. Commonly speaking, it’s “how much” your compiler (or your interpreter) will check for type errors, and how often it will throw them to your sorry face.
This “how much” is not really well defined. As always, it depends on the programming language, or even on the version of the programming language you work with. With the last versions of PHP for example, you can do this:
<?php
declare(strict_types = 1);
echo "My name is " . 65 . "\n"; // This will NOT throw an error
// => My name is 65
function coolFunction(string $coolThing): string {
return "I think a " . $coolThing . " is a cool thing!";
}
echo coolFunction(65);
// => PHP Fatal error: Uncaught TypeError: Argument 1 passed to coolFunction() must be of the type string, int given
We reduced the possibilities of type coercion by declaring a strict_type
, but we didn’t get rid of them totally. Type safety still improved: our confidence about knowing the types of our variables improved.
A more precise definition of a safely typed language is a language which prevent you to access memory you shouldn’t access, or to perform “impossible” operations, like division by 0. Most high level languages nowadays respect this definition of type safety, so saying that language X is safer than language Y is, by this definition, often nonsense.
To make matters even more complicated, some consider type safety as a property of a program, and not of a programming language. This is, I think, a better definition, because you can always create type errors in your code, even with strict type systems.
What Type System Should We Choose?
Type systems can be loosely categorized, as we saw above. Yet, programming languages, depending on their history and the direction they take, are rarely on one or on the other side of the fence. Even if many studies tried to finally sort out if a static type system with explicit typing is better than a dynamic type system with implicit one, there’s no definitive, absolute truth.
When you choose a language, it’s important to check if a type system is consistent, that is, if the language doesn’t surprise you too much while doing operations on types.
That said, the type system is only one (important) variable in the equation. Other things should be considered when choosing a language to use: the libraries available, the tooling, or the way of thinking the language try to push you into (the paradigm).
Programming languages are tools to express your thinking, to bring life to solutions of specific problems. It’s a mean to an end, not the end itself. Choosing a language will largely depends on the business problems your software needs to solve.
Learn How The Type System Of Your Language Behave
Bugs can appear because of the misconceptions we have about the type systems we use. Some of these systems are so inconsistent that you’ll likely to end up with logical contradictions and unforeseen values. This can be a real pain to debug.
On the other hand, strict type systems can create some rigidity in your design, which will make the code more difficult to change when the messy real world, the context, evolve. Still, they will help avoid a whole class of errors.
That’s why you should learn how your language of choice handles types. Look at it in the official documentation. Experiment with it. Try to find the eventual inconsistencies and pitfalls. If you’re interested, I wrote an article about that for PHP 7.
This advice holds even if you work with the same programming language for years. A language, like a software, is meant to evolve. Stay aware of its changes.
What’s Your Type of Type System?
Looking at different type systems from different programming languages can be an interesting exercise. It can give you different approaches to solve problems. Type systems convey semantics, that is, a certain philosophy about a programming language. Don’t be too much attached to a programming language, and try to broaden your horizons.
What did we learn in this article?
- Types and data types are two names for the same idea.
- Types represent a set of value, a set of rules, and gives semantics (sense) to your data.
- To enforce the rules of a type system, type checking often occurs at compile time (static type checking) or at runtime (dynamic type checking).
- Changing a type can be implicit (type coercion), or explicit (type casting).
- Declaring a type can be implicit (type inference), or explicit.
- Types of a function signature can be implicit or explicit, too.
- Type strength (“strong” or “weak”) has no formal definition, and people speak about it with many different meanings in mind.
- A type system makes only sense in the context of a programming language.
- You should learn how the type system of the language you’re using behave, to avoid bad surprises.
I hope this overview answered some of your questions. If you see mistakes or if you have more questions, the comment section is ready to handle your prose.