A Linguistic Look At Programming Languages

Programming languages can follow a similar pattern as spoken languages as they both can be analysed on the basis of structure, grammar and semantics. This article will look at a brief overview of the linguistic aspect of programming languages and this analysis strongly ties into Programming Language Theory.

Syntax

Syntax is the structure of statements in a programming language. It specifies how statements are formed. Syntax is made up of syntactical and lexical rules. Syntactical rules define how words are combined to make up a sentence. Lexical rules define the sets of characters that make up words (alphabet) and how these characters are combined together to make valid words.

The Extended Backus–Naur form is a family of notations used to describe formal languages and each member of the family can express a context-free grammar.

This grammar has:

  • Terminals: Symbols that cannot be changed. These can be keywords, symbols or characters.
  • Non-Terminals: Consist of terminals and non-terminals (Symbols which can be changed) and are enclosed in “< >”.
  • Sequences: These are composed of terminals and non-terminals.
  • Choice: Allows you to choose A OR B. It is denoted as “|”.
  • Repetition: Denoted by “+” or “*”. So, B* means zero or more occurrences of B. And C+ means one or more occurrences of C.
  • Recursion: This means a rule is defined in terms of itself.

Semantics

Semantics is concerned with the meaning of words or statements. Programs can be syntactical correct but semantically incorrect (or in other words meaningless or senseless.)

When running a program, the compiler will verify the semantic rule of the program at compile time. The rules that it checks for here are called static semantics. Dynamic Semantics defines how different constructs in a language are executed in a given program. (A construct is just a command, element or statement in a program.) So the main difference between static and dynamic semantics is that static semantics can be verified to be correct before a program is executed but the dynamic semantics can only be checked during execution. A program can only be executed if it is syntactically correct but also the static semantics must be correct.

A language can also be defined in terms of mathematical concepts. This is known as formal semantics. Formal semantics provides a clear and unambiguous meaning to each element in the language. We can formally express the semantics of a language through axiomatic and denotational semantics.

Axiomatic Semantics

This is based on mathematical logic and can prove the correctness of a program. In axiomatic logic, there is no distinction between a phrase’s meaning and the logical formulas that apply to it. It defines the execution of a program in regards to a state machine. A state machine is a concept in which a machine can have many states but can only be in one state at a time. It might be useful to think of it with the analogy of a traffic light. A traffic light has three states: red, orange and green. Although, the traffic light can only be in one of these states at a given time. The state of a program is defined by a set of first order predicates (These are predicates that take constants or variables as their arguments. Eg. ∀x. (cat(x) -> animal(x)). Here, x is the variable and the predicate means; for all x, if x is a cat, then x is an animal). These predicates define the values of a program’s variables in a given state. Axiomatic Semantics can define the meaning of a program by comparing the state of the program before a construct executes to after the execution of a construct.

Post and Pre-Conditions

A post-condition is a predicate that is required to hold after the execution of a particular construct. A pre-condition is a predicate that holds before the execution of a construct. It guarantees that the construct will terminate and that the value of the post condition will hold after the termination. There can be many pre-conditions for a given post-condition. The pre-condition with the fewest constraints and restrictions is called the weakest condition.

For example, if we have the statement:

x = sqrt(y)

Where sqrt is the square root. A precondition will be that y >= 0. The post condition will be that x >= 0.

There is a function which for each construct in a language, calculates all the preconditions including the weakest precondition for that construct. This function is called a predicate transformer.

Denotation Semantics

In denotational semantics, each construct is specified in terms of a mathematical object. It will confirm the meaning of a program by checking the state of a program before execution to after execution. The state of a program is defined from the variable identifiers and their values. In denotational semantics, the semantics should be compositional. This means that each denotation of a construct into denotations of its sub-constructs.

Conclusion

This has been a very brief summary of syntax and semantics within programming languages. There are some very good resources online if you wish to delve into this topic further and there is an online pdf here on the topic of denotational semantics that has been written by David A. Schmidt that you may find useful.

Leave a comment

Blog at WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started