PHP and Zend Engine Internals

By Kaushik Pal

on April 28, 2014

Overview

Version 1.0 of the Zend Engine functions much like the heart and brain of PHP 4.0. It contains the process that provides the sub-structure and facilities to the functional modules. It also implements the language syntax, as well. The Zend Engine 1.0 is actually the second revision of the PHP scripting engine. It is still based on the same rules as the PHP 3.0 engine that was basically Zend Engine 0.5. Now it is permissible to migrate the path from PHP 3.0 to 4.0. The development has the same ‘state of mind’ as per PHP 3.0. We feel it is right time to start working towards a revision of the Zend Engine. It would also incorporate new structures and solutions to some of the most difficult problems faced by the PHP designer or developers.

In this article I will discuss the Zend Engine internals based on the PHP platform.

Introduction

Zend Engine is an open source scripting engine which acts as an interpreter for the PHP programming language. This was initially developed by two students at the Technion – Israel Institute of Technology. Zend engine is a virtual machine or VM. As we know that a Virtual machine is nothing but software which simulates a physical computer. The Zend engine consists of multiple components e.g. a compiler, ZFMI (Zend Function Module Interface) and a virtual CPU or an executor.

How Zend Engine Works

A Zend engine consists of three major components:

→ Lexical analyzer or lexer

→ Parser

→ Executor

We know that Zend is a scripting engine and it works as an interpreter. So let us check different phases of a script which is subjected to a Zend engine. The script passes through the following steps and finally gets executed by Zend engine:

→ Step 1: Lexical Analysis – In this step the script is passed through a lexical analyzer, also known as lexer. Here the script that is human readable is migrated to tokens which are understood and accepted by the machine. Once the entire script is tokenized, the tokens are passed to the parser.

→ Step 2: Parsing – In this step, the parser parses the tokens that it received from the lexer and generates an instruction set which runs on the Zend engine. The Zend engine is nothing but a virtual machine (VM) with an instruction set, that is similar to assembly language, and executes it. Parser generates the abstract syntax tree that can be optimized before passing to the code generator. This whole mechanism is jointly called compilation. The output of the compilation is an intermediate code which is a machine independent code for Zend virtual machine. This intermediate code contains an array of instruction sets for the Zend Virtual machine, also known as operation codes or opcodes. These opcodes are three address codes – two operands for the input and one for the output. In addition to these the opcodes also contain a handler which processes the operands. These opcodes contain instructions to perform all sorts of operations ranging from a basic operation on the two inputs and storing the output onto the third operand to a complex scenario which requires implementing a flow control.

→ Step 3: Execution – Once the intermediate code is generated, it is passed to the executor which reads each of the instructions from the array and executes them.

The compilation and execution phases are executed by two separate functions within the Zend engine. These are Zend_compile and Zend_execute.

Web server Interaction involving Zend engine

The internal architecture of the Zend engine is shown below in the diagram.

Figure 1: Zend engine Architecture

Internal components of Zend Engine

Now let us check the internal components of Zend Engine one by one.

→ ZMFI or Zend Function Module Interface: This interface acts as a communication channel between the function modules. Function modules are nothing but PHP extensions that have some modules written and included within them.

→ Opcode Cache: Opcode cache is a generic cache which resides within the Zend engine and caches the opcode of a file. If the file is requested again, it just gets executed from the cache if there is no change in the file.

Some Examples

Let us look at an example to check different phases of a PHP code when it goes through a Zend Engine.

First, we will discuss a simple example as shown below.

Listing 1: Sample PHP file

<?php
  		$name = 'Ricardo';
  		echo $name;
	?>

The above PHP code when subjected to Zend engine, is converted to the following opcode:

Figure 2: Showing generated opcode

The executor of the Zend engine reads these opcodes one at a time and executes it as per the instruction mentioned in the opcode. The above code is executed in the following manner:

→ Opnum 0 or Opcode 0 – In this step, the pointer to the variable – ‘name’ is assigned the Register 0. Subsequently we use ‘ZEND_FETCH_W’ (where w stands for write) and assign it to the variable.

→ Opnum 1 or Opcode 1– In this step, the ZEND_ASSIGN handler assigns the value – ‘Ricardo’ to Register 0 which is pointer to the variable – ‘name’. Register 1 is also assigned but never used. It could have been utilized if we had an expression such as:

if ($name == 'Ricardo’) { }

→ Opnum 2 or Opcode 2 – In this step, we re-fetch the value of $name into Register 2. We use the opcode ZEND_FETCH_R as the variable is used in a read only context.

→ Opnum 3 or Opcode 3 – In this step, the instruction ‘ZEND_ECHO’ prints the value of Register 2 by sending the value to the output buffering system.

→ Opnum 4 0r Opcode 4 – In this step, the instruction ‘ZEND_RETURN’ is called which sets the return value of the script to 1. As we know even if we do not call the explicit return which is true for this case as well, every script contains an implicit return 1.

Now we will have a look into a slightly more complicated example:

Listing 2: Sample PHP file with conversion to upper case

<?php
  		$name = 'Ricardo';
  		echo strtoupper($name);
	?>

As we see here this script initializes a variable and then prints the same after converting the text into upper case. The intermediate code dump for the above PHP script is quite similar to the earlier one.

Figure 3: Showing generated opcode

The opcodes in the above two examples are quite similar except for the following:

→ Opnum 3 or Opcode 3 – In this step, the instruction ‘ZEND_SEND_VAR’ pushes a pointer to Register 2 which has the variable – $name into the stack of arguments. This argument stack is designed to be called by the functions in the order prints the value of Register 2 by sending the value to the output buffering system.

→ Opnum 4 0r Opcode 4 – In this step, the instruction ‘ZEND_DO_FCALL’ is called which internally calls the ‘strtoupper’ function and also mentions that the output should be send to Register 3.

Following diagram shows the work flow direction while a PHP script is passes through the Zend engine.

Figure 4: Showing work flow in Zend engine

About the Author

Kaushik Pal is a technical architect with 15 years of experience in enterprise application and product development. He has expertise in web technologies, architecture/design, java/j2ee, Open source and big data technologies. You can find more of his work at www.techalpine.com and you can email him here.