While reading the Python documentation, you may have found fragments of BNF notation (Backus–Naur form) that look something like the following:
BNF Grammarname ::= lc_letter (lc_letter | "_")* lc_letter ::= "a". "z"
What’s the meaning of all this strange code? How can this help you in understanding Python concepts? How can you read and interpret this notation?
In this tutorial, you’ll get to know the basics of Python’s BNF notation and learn how to take advantage of it to get a deep understanding of the language’s syntax and grammar.
In this tutorial, you’ll:
To get the most out of this tutorial, you should be familiar with Python syntax, including keywords, operators, and some common constructs like expressions, conditional statements, and loops.
Get Your Code: Click here to download the free sample code that shows you how to read Python’s BNF notation.
The Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars. Computer scientists often use this notation to describe the syntax of programming languages because it allows them to write a detailed description of a language’s grammar.
The BNF notation consists of three core pieces:
Component | Description | Examples |
---|---|---|
Terminals | Strings that must exactly match specific items in the input. | "def" , "return" , ":" |
Nonterminals | Symbols that will be replaced by a concrete value. They may also be called simply syntactic variables. | , |
Rules | Conventions of terminals and nonterminals that define how these elements relate. | ::= "a" |
By combining terminals and nonterminals, you can create BNF rules, which can get as detailed as you need. Nonterminals must have their own defining rules. In a piece of grammar, you’ll have a root rule and potentially many secondary rules that define the required nonterminals. This way, you may end up with a hierarchy of rules.
BNF rules are the core components of a BNF grammar. So, a grammar is a set of BNF rules that are also called production rules.
In practice, you can build a set of BNF rules to specify the grammar of a language. Here, language refers to a set of strings that are valid according to the rules defined in the corresponding grammar. BNF is mainly used for programming languages.
For example, the Python syntax has a grammar that’s defined as a set of BNF rules, and these rules are used to validate the syntax of any piece of Python code. If the code doesn’t fulfill the rules, then you’ll get a SyntaxError .
You’ll find many variations of the original BNF notation out there. Some of the most relevant include the extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF).
In the following sections, you’ll learn the basics of creating BNF rules. Note that you’ll use a variation of BNF that matches the requirements of the BNF Playground site, which you’ll use for testing your rules.
As you already learned, by combining terminals and nonterminals, you can create BNF rules. These rules typically follow the syntax below:
BNF Grammarsymbol> ::= expression
In the BNF rule syntax, you have the following parts:
When building BNF rules, you can use a variety of symbols with specific meanings. For example, if you’re going to use the BNF Playground site to compile and test your rules, then you’ll find yourself using some of the following symbols:
Symbol | Meaning |
---|---|
"" | Encloses a terminal symbol |
<> | Indicates a nonterminal symbol |
() | Indicates a group of valid options |
+ | Specifies one or more of the previous element |
* | Specifies zero or more of the previous element |
? | Specifies zero or one occurrence of the previous element |
| | Indicates that you can select one of the options |
[x-z] | Indicates letter or digit intervals |
Once you know how to write a BNF rule and what symbols to use, you can start creating your own rules. Note that the BNF Playground has several additional symbols and syntactical constructs that you can use in your rules. For a complete reference, click the Grammar Help section at the top of the page.
Now, it’s time to start playing with a couple of custom BNF rules. To kick things off, you’ll start with a generic example.
Say that you need to create a context-free grammar to define how a user should input a person’s full name. In this case, the full name will have three components:
Between each component, you need to place exactly one whitespace. You should also treat the middle name as optional. Here’s how you can define this rule:
BNF Grammarname> ::= name> " " (name> " ")? name>
The left-hand part of your BNF rule is a nonterminal variable that identifies the person’s full name. The ::= symbol denotes that will be replaced with the right-hand part of the rule.
The right-hand part of the rule has several components. First, you have the first name, which you define using the nonterminal. Next, you need a space to separate the first name from the following component. To define this space, you use a terminal, which consists of a space character between quotes.
After the first name, you can accept a middle name, and after that, you need another space. So, you open parentheses to group these two elements. Then you create and the " " terminal. Both are optional, so you use a question mark ( ? ) after to denote that condition.
Finally, you need the family name. To define this component, you use another nonterminal, . That’s it! You’ve built your first BNF rule. However, you still don’t have a working grammar. You only have a root rule.
To complete the grammar, you need to define rules for , , and . To do this, you need to meet some requirements:
In this case, you can start by defining two rules, one for uppercase letters and one for lowercase letters:
BNF Grammarname> ::= name> " " (name> " ")? name> letter> ::= [A-Z] letter> ::= [a-z]
In the highlighted lines of this grammar snippet, you create two pretty similar rules. The first rule accepts all the ASCII letters from uppercase A to Z. The second rule accepts all the lowercase letters. In this example, you don’t support accents or other non-ASCII letters.
With these rules in place, you can build the rest of your rules. To kick things off, go ahead and add the rule:
BNF Grammarname> ::= name> " " (name> " ")? name> letter> ::= [A-Z] letter> ::= [a-z] name> ::= letter> letter>*
To define the rule, you start with the nonterminal to express that the first letter of the name must be an uppercase letter. Then, you continue with the nonterminal followed by an asterisk ( * ). This asterisk means that the first name will accept zero or more lowercase letters after the initial uppercase letter.
You can follow this same pattern to build the and rules. Would you like to give it a try? Once you’re done, click the collapsible section below to get the complete grammar so that you can compare it with yours:
Full name grammar Show/Hide
BNF Grammarname> ::= name> " " (name> " ")? name> letter> ::= [A-Z] letter> ::= [a-z] name> ::= letter> letter>* name> ::= letter> letter>* name> ::= letter> letter>*
You can check if your full name grammar works using the BNF Playground site. Here’s a demo:
Once you navigate to the BNF Playground site, you can paste your grammar rules in the text input area. Then press the COMPILE BNF button. If everything is okay with your BNF rules, then you can enter a full name in the Test a string here! input field. Once you’ve entered a person’s full name, the field will turn green if the input string fulfills the rules.
In the previous section, you learned how to create a BNF grammar that defines how your users must provide a person’s name. This is a generic example that may or may not relate to programming. In this section, you’ll get more technical by writing a short set of BNF rules to validate an identifier in a hypothetical programming language.
An identifier can be a variable, function, class, or an object’s name. In your example, you’ll write a set of rules to check whether a given string meets the following requirements:
Here’s the root rule for your identifier:
BNF Grammaridentifier> ::= char> (char> | digit>)*
In this rule, you have the nonterminal variable, which defines the root. On the right-hand side, you first have the nonterminal. The rest of the identifier is grouped inside parentheses. The asterisk after the group says that elements from the group can appear zero or more times. Each such element is either a character or a digit.
Now, you need to define the and nonterminals with their own dedicated rules. They’ll look like in the code below:
BNF Grammaridentifier> ::= char> (char> | digit>)* char> ::= [A-Z] | [a-z] | "_" digit> ::= [0-9]
The rule accepts one ASCII letter in either lowercase or uppercase. Alternatively, it can accept an underscore. Finally, the rule accepts a digit from 0 to 9. Now, your set of rules is complete. Go ahead and give it a try on the BNF Playground site.
For you as a programmer, reading BNF rules can be a pretty useful skill. For example, you’ll often find that the official documentation of many programming languages includes the BNF grammar of the languages, in whole or in part. So, being able to read BNF allows you to better understand the language syntax and intricacies.
From this point on, you’ll learn how to read Python’s BNF variation, which you’ll find in several parts of the language documentation.
Python uses a custom variation of the BNF notation to define the language’s grammar. In many parts of the Python documentation, you’ll find portions of BNF grammar. These snippets can help you better understand any syntactic construct that you’re studying.
Python’s BNF variation uses the following style:
Symbol | Meaning |
---|---|
name | Holds the name of a rule or nonterminal |
::= | Means expand into |
| | Separates alternatives |
* | Accepts zero or more repetitions of the preceding item |
+ | Accepts one or more repetitions of the preceding item |
[] | Accepts zero or one occurrence, which means that the enclosed item is optional |
() | Groups options |
"" | Defines literal strings |
space | Is only meaningful to separate tokens |
These symbols define Python’s BNF variation. One notable difference from what regular BNF rules look like is that Python doesn’t use angle brackets ( <> ) to enclose nonterminal symbols. It only uses the nonterminal identifier or name. Arguably, this makes rules cleaner and more readable.
Also note that the square brackets ( [] ) have a different meaning for Python. Up to this point, you’ve used them to enclose sets of characters like [a-z] . In Python, these brackets mean that the enclosed element is optional. To define something like [a-z] in Python’s BNF variation, you’ll use "a". "z" instead.
You’ll find many BNF snippets in the Python documentation. Learning how to navigate and read them is quite a useful skill for you as a Python developer. So, in the following sections, you’ll explore a few examples of BNF rules from the Python documentation, and you’ll learn how to read them.
Now that you know the basics of reading the BNF notation and you’ve learned the characteristics of Python’s BNF variation, it’s time for you to start reading some BNF grammar from the Python documentation. This way, you’ll build the required skills to take advantage of this notation to learn more about Python and its syntax.
To kick things off, you’ll start with the pass statement, which is a simple statement that allows you to do nothing in Python. The BNF notation for this statement is like the following:
BNF Grammarpass_stmt ::= "pass"
Here, you have the name of the rule, pass_stmt . Then you have the ::= symbol to indicate that the rule expands to "pass" , which is a terminal symbol. This means that this statement consists of the pass keyword on its own. There are no additional syntactical components. So, you end up knowing the syntax for the pass statement:
pass
The BNF rule for the pass statement is one of the simplest rules that you’ll find in the documentation. It only contains a terminal that defines the syntax straightforwardly.
Another common statement that you’ll often use in your day-to-day coding is return . This statement is a bit more complex than pass . Here’s the BNF rule for return from the documentation:
BNF Grammarreturn_stmt ::= "return" [expression_list]
In this case, you have the rule’s name, return_stmt , and the ::= as usual. Then, you have a terminal symbol consisting of the word return . The second component of this rule is an optional list of expressions, expression_list . You know that this second component is optional because it’s enclosed in square brackets.
Having an optional list of expressions after the word return is consistent with the fact that Python allows return statements without an explicit return value. In this case, the language automatically returns None , which is Python’s null value:
>>> def func(): . return . >>> print(func()) None
This toy function uses a bare return without providing an explicit return value. In this case, Python automatically returns None for you.
Now, if you click the expression_list variable on the documentation, then you’ll land on the rule below:
BNF Grammarexpression_list ::= expression ("," expression)* [","]
Again, you have the rule’s name and the ::= symbol. Then, you have a required nonterminal variable, expression . This nonterminal symbol has its own definition rule, which you can access by clicking on the symbol itself.
Up to this point, you have the syntax of a return statement with a single return value:
>>> def func(): . return "Hello!" . >>> func() 'Hello!'
In this example, you use the "Hello!" string as the return value of your function. Note that the return value can be any Python object or expression.
The rule continues by opening parentheses. Remember that BNF uses parentheses to group objects. In this case, you have a terminal consisting of a comma ( "," ), and then you have the expression symbol again. The asterisk after the closing parentheses indicates that this construct can appear zero or more times.
This part of the rule describes those return statements with multiple return values:
>>> def func(): . return "Hello!", "Pythonista!" . >>> func() ('Hello!', 'Pythonista!')
Now, your function returns two values. To do this, you provide a comma-separated series of values. When you call the function, you get a tuple of values.
The final part of the rule is [","] . This tells you that the list of expressions can include an optional trailing comma. This comma may cause tricky results:
>>> def func(): . return "Hello!", . >>> func() ('Hello!',)
In this example, you use the trailing comma after a single return value. As a result, your function returns a tuple with a single item. However, note that the comma doesn’t cause any effect if you already have multiple comma-separated values:
>>> def func(): . return "Hello!", "Pythonista!", . >>> func() ('Hello!', 'Pythonista!')
In this example, you add a trailing comma to a return statement with multiple return values. Again, you get a tuple of values when you call the function.
Another interesting BNF snippet that you can find in the Python documentation is the one that defines the syntax of assignment expressions, which you build with the walrus operator. Here’s the root BNF rule for this type of expression:
BNF Grammarassignment_expression ::= [identifier ":="] expression
The right-hand part of this rule starts with an optional component that includes a nonterminal called identifier and a terminal consisting of the ": alert alert-primary" role="alert">
Note: At first glance, it may be weird that the assignment part is optional, as the whole point of an assignment expression is the assignment itself. However, making this part optional greatly simplifies many of the grammar rules because an assignment expression is allowed almost everywhere a plain expression is. You’ll see an example of this simplification in the following section.
This matches the syntax of an assignment expression with the walrus operator:
identifier := expression
Note that in an assignment expression, the assignment part is optional. You’ll get the same value out of evaluating the expression whether you perform the assignment or not.
Here’s a working example of an assignment expression:
>>> (length := len([1, 2, 3])) 3 >>> length 3
In this example, you create an assignment expression that assigns the number of items in a list to the length variable.
Note that you’ve enclosed the expression in parentheses. Otherwise, it’ll fail with a SyntaxError exception. Check out the Walrus Operator Syntax section from The Walrus Operator: Python’s Assignment Expressions to figure out why you need the parentheses.
Now that you’ve learned how to read the BNF rules for simple expressions, you can jump into compound statements. Conditional statements are pretty common in any piece of Python code. The Python documentation provides the BNF rule for this type of statement:
BNF Grammarif_stmt ::= "if" assignment_expression ":" suite ("elif" assignment_expression ":" suite)* ["else" ":" suite]
When you start reading this rule, you immediately find the "if" terminal symbol, which you must use to start any conditional statement. Then, you find the assignment_expression nonterminal, which you already studied in the previous section.
Note: The if_stmt rule uses the assignment_expression nonterminal to define the condition. This allows you to use either an assignment expression or a plain expression in the condition. Remember that the assignment part is optional in assignment_expression .
Next, you have the ":" terminal. This is the colon that you need to use at the end of a compound statement’s header. This colon denotes that the statement’s header is complete. Finally, you have a required nonterminal called suite , which is a set of indented statements.
Following this first part of the rule, you end up with the following Python syntax:
if assignment_expression: suite
This is a bare-bones if statement. It starts with the if keyword. Then, you have an expression that Python evaluates for truth value. Finally, you have a colon that opens the possibility to have an indented block that works as the suite.
The second line of the BNF rule defines the syntax of elif clauses. In this line, you have the elif keyword as a terminal symbol. Then, you have an expression, a colon, and again, a suite of indented code:
if assignment_expression: suite elif assignment_expression: suite
You can have zero or more elif clauses in a conditional statement, which you know because of the asterisk after the closing parentheses. All of them will follow the same syntax.
The final part of the conditional BNF rule is the else clause, which consists of the else keyword followed by a colon and an indented suite of code. Here’s how this translates to Python syntax:
if assignment_expression: suite elif assignment_expression: suite else: suite
The else clause is also optional in Python. In the BNF rule, you know that because of the square brackets surrounding the final line of the rule.
Here’s a toy example of a working conditional statement:
>>> def read_temperature(): . return 25 . >>> if (temperature := read_temperature()) 10: . print("The weather is cold!") . elif 10 temperature 25: . print("The weather is nice!") . else: . print("The weather is hot!") . The weather is nice!
In the if clause, you use an assignment expression to grab the current temperature value. Then, you compare the current value with 10 . Next, you reuse the temperature value to create the expression in the elif clause. Finally, you have the else clause for those cases where the temperature is hot.
Loops are another commonly used compound statement in Python. You have two different loop statements in Python:
The BNF grammar for Python’s for loop is the following:
BNF Grammarfor_stmt ::= "for" target_list "in" starred_list ":" suite ["else" ":" suite]
The first line defines the loop header, which starts with the "for" terminal. Then you have the target_list nonterminal. In short, this nonterminal represents the loop variable or variables.
Next, you have the "in" terminal, which represents the in keyword. The starred_list nonterminal symbol represents an iterable object. Finally, you have a colon that gives a pass to an indented block of code, suite .
Note: Python’s grammar is in constant evolution. For example, in Python 3.10, the for loop rule was written as:
BNF Grammarfor_stmt ::= "for" target_list "in" expression_list ":" suite ["else" ":" suite]
Here, instead of starred_list , you have expression_list . In Python 3.11, starred lists became valid in for loops. So, the grammar changed.
Again, you can click any nonterminal symbol to navigate to its defining BNF rule and dive deeper into its definition and syntax. For example, if you click the target_list symbol, then you’ll be presented with the following BNF rules:
BNF Grammartarget_list ::= target ("," target)* [","] target ::= identifier | "(" [target_list] ")" | "[" [target_list] "]" | attributeref | subscription | slicing | "*" target
In the first line, you can see that target_list consists of one or more target objects separated by commas. This list can include an optional trailing comma, which doesn’t alter the result. In practice, target objects can be an identifier (variable), a tuple, a list, or any other of the provided options. The pipe characters ( | ) let you know that all these values are separate alternatives.
The second line of the BNF rule for a for loop defines the syntax of the loop’s else clause. This clause is optional, which you learned from the enclosing square brackets. The line consists of the "else" terminal, followed by a colon and a suite of indented code.
You can translate the above BNF rule to the following Python syntax:
for target_list in starred_list: suite else: suite
The loop has a series of comma-separated loop variables in target_list and an iterable of data represented by starred_list .
Here’s a quick example of a for loop:
>>> high = 5 >>> for number in range(high): . if number > 5: . break . print(number) . else: . print("range covered") . 0 1 2 3 4 range covered
This loop iterates over a range of numbers that goes from 0 to high . In this example, high is 5 , so the break statement doesn’t run, and the else clause runs at the end of the loop. If you change the value of high to 10 , then the break statement will run, and the else clause won’t.
Note: It doesn’t make sense to have a loop with an else clause if the loop’s main suite doesn’t have a break statement. If you find yourself in this situation, then remove the else: header and unindent its suite.
When it comes to while loops, their BNF rule is the following:
BNF Grammarwhile_stmt ::= "while" assignment_expression ":" suite ["else" ":" suite]
Python’s while loops start with the while keyword, which is the first component in the right-hand part of the rule. Then, you need an assignment_expression , a colon, and a suite of indented code:
while assignment_expression: suite else: suite
Note that the while loop also has an optional else clause that works the same as in for loops. Can you come up with a working example of a while loop?
When you’re reading Python’s BNF rules in the documentation, you can follow a few best practices to improve your understanding. Here are a few recommendations for you:
If you apply these recommendations to your BNF reading adventure, then you’ll feel way more comfortable with them. You’ll be able to better understand the rules and improve your Python skills in the process.
Now you know what BNF notation is and how Python uses it in the official documentation. You’ve learned the basics of Python’s version of the BNF notation and how to read it. This is a fairly advanced skill that will help you better understand the language’s syntax and grammar.
In this tutorial, you’ve:
Knowing how to read the BNF notation in the Python documentation will give you a better and deeper understanding of Python’s syntax and grammar. Go for it!
Get Your Code: Click here to download the free sample code that shows you how to read Python’s BNF notation.
Mark as CompletedGet a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.
About Leodanis Pozo Ramos
Leodanis is an industrial engineer who loves Python and software development. He's a self-taught Python developer with 6+ years of experience. He's an avid technical writer with a growing number of articles published on Real Python and other sites.
Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are: