There is a single way of defining a structured data format that then is exchanged and
serialized in protocol buffers. This is done by defining a buffer message type in
a .proto file. Each protocol buffer message then represents a logical record of information
in this format. Each message then contains a series of name-value pairs using this syntax.
Here is a very basic example of a .proto file that defines a message containing information
for an address book entry:As you can see, the message format is rather straightforward. Each message type has at
least one uniquely numbered field, and each field has a name and a value type. Note
that value types can be numbers (i.e., integer or floating-point), booleans, strings, raw
bytes, or other protocol buffer message types (as in the previous example). These other
types allow one to structure data hierarchically, in a manner that allows for customization
and flexibility to suit the needs of the application. Optional fields, required fields,
and repeated fields can be specified as well, and we showed that in the previous example
with the repeated PhoneNumber portion indicating that four entries are repeated.
Once the data structure types and formats are defined, a number of protocol buffer
compiler tools are available that can generate source code from these types in order to
write and read these. These tools are available for a wide variety of data stream types as
well as for a variety of languages, including Java, Python, Perl, and C++. Once messages
have been defined, one of the protocol buffer compilers is run for a particular target
language. This compiler is fed the .proto file or files as input, and that generates data
access classes. Part of this process generates access functions for each field (i.e., query()
or set_query()) as well as methods to serialize or parse the already defined data structures
to/from raw bytes—so, for instance, if your chosen language is C++, running the
compiler on the earlier example will generate a class called PersonalRecord. You can
then use this class in your application to populate, serialize, and send and retrieve Person protocol buffer messages.One very cool feature of protocol buffers is that code that is generated for receiving
messages will ignore structures with additional fields not defined in the version of the
code compiled. This means that absolute compatibility between sides of the discussion
need not be precise. This is allowed because its designers were concerned with server
upgrades required as the APIs evolved rapidly and wanted to obviate the need to upgrade
all of the servers at once. On a small scale, this might not seem like a big deal, but for a
company like Google, upgrading tens of thousands of servers in a short period of time
can definitely be an issue.
Getting back to the example, if you examine the newly generated C++ code, you can
imagine populating these classes and using them to transmit a message such as the
following: