MQTT Handling of Disallowed Unicode Code Points Version 1.0
Committee Note 01
19 April 2018
Brian Raymor (),
Richard J Coppen (),
Andrew Banks (),
Ed Briggs (),
Ken Borgendale (),
Rahul Gupta (),
This Committee Note describes identified exposures in the handling of disallowed Unicode code points. Users of MQTT are alerted to the possibility that some combinations of MQTT Clients and Servers might allow properly authorized publishing Clients to cause the disconnection of properly authorized subscribing Clients. We describe how to identify if this risk is present and how to eliminate it.
This document was last revised or approved by the OASIS Message Queuing Telemetry Transport (MQTT) TC on the above date. The level of approval is also listed above. Check the "Latest version" location noted above for possible later revisions of this document.
Technical Committee (TC) members should send comments on this document to the TC's email list. Others should send comments to the TC's public comment list, after subscribing to it by following the instructions at the "" button on the TC's web page at
When referencing this document the following citation format should be used:
MQTT Handling of Disallowed Unicode Code Points Version 1.0. Edited by Andrew Banks, Ed Briggs, Ken Borgendale, and Rahul Gupta. 19 April 2018. OASIS Committee Note 01. http://docs.oasis-open.org/mqtt/disallowed-chars/v1.0/cn01/disallowed-chars-v1.0-cn01.html. Latest version: http://docs.oasis-open.org/mqtt/disallowed-chars/v1.0/disallowed-chars-v1.0.html.
Copyright © OASIS Open 2018. All Rights Reserved.
All capitalized terms in the following text have the meanings assigned to them in the OASIS Intellectual Property Rights Policy (the "OASIS IPR Policy"). The fullmay be found at the OASIS website.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are included on all such copies and derivative works. However, this document itself may not be modified in any way, including by removing the copyright notice or references to OASIS, except as needed for the purpose of developing any document or deliverable produced by an OASIS Technical Committee (in which case the rules applicable to copyrights, as set forth in the OASIS IPR Policy, must be followed) or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The MQTT V3.1.1 specification section 1.5.3 UTF-8 encoded strings and ISO/IEC 20922:2016, describe the set of Unicode Control Codes and Unicode Noncharacters which should not be included in a UTF-8 Encoded String. The specifications do not require a Client or Server implementation to validate that these code points are not used in UTF-8 Encoded Strings, in particular, Topic Names. We refer to these code points as Disallowed Unicode code points in this document.
If the Server does not validate the code points in a UTF-8 encoded string but a subscribing Client does, then a second Client might be able to cause the subscribing Client to disconnect by publishing on a Topic Name that contains a Disallowed Unicode code point. This document recommends some steps that can be taken to prevent this eventuality.
An implementation would normally choose to validate UTF-8 Encoded strings, checking that the Disallowed Unicode code points are not used, so as to avoid implementation difficulties. This includes the use of libraries that are sensitive to these code points, or to protect applications from having to process them.
Validating that these code points are not used removes some security exposures. There are possible security exploits which use control characters in log files to mask entries in the logs or confuse the tools which process log files. The Unicode Noncharacters are commonly used as special markers and allowing them into UTF-8 Encoded Strings could permit such exploits.
The publisher of an Application Message normally expects that the Servers will forward the message to subscribers, and that these subscribers are capable of processing the messages.
Here we describe the set of conditions which allow a publishing Client to cause the disconnection of subscribing Clients. Consider a situation where:
· A Client publishes an Application Message using a Topic Name containing one of the Disallowed Unicode code points.
· The publishing Client library allows the Disallowed Unicode code point to be used in a Topic Name rather than rejecting it.
· The publishing Client is authorized to send the publication.
· A subscribing Client is authorized to use a Topic Filter which matches the Topic Name. Note that the Disallowed Unicode code point might occur in a part of the Topic Name matching a wildcard character in the Topic Filter.
· The Server forwards the message to the matching subscriber rather than disconnecting the publisher.
· In this case the subscribing Client might:
o Disconnect, because it does not allow the use of Disallowed Unicode code points. If the Client reconnects and the message is QoS=1 or QoS=2, the message will be sent again, causing the Client to disconnect again.
o Accept the Application Message but fail to process it because it contains one of the Disallowed Unicode code points.
o Successfully process the Application Message.
The potential for Client disconnection might go unnoticed until a publisher uses one of the Disallowed Unicode code points.
If there is a possibility that a Disallowed Unicode code point could be included in a Topic Name delivered to a Client, the solution owner can adopt one of the following suggestions:
1) Change the Server implementation to one that disconnects a publisher which uses a Disallowed Unicode code point in a Topic Name.
2) Restrict the authorization rules for the publisher so that it cannot publish Application Messages using Topic Names which contain Disallowed Unicode code points.
3) Restrict the Topic Filters authorized to subscribers so that a Client cannot use Topic Filters containing Disallowed Unicode code points. If a client is allowed to make a subscription containing a wild card character, ensure that the Server is configured so that publishers cannot make publications where a Disallowed Unicode code point would match the wildcard.
4) Change the Client library used by the subscribers to one that tolerates the use of Disallowed Code points. The client can either process or discard messages with Topic Names that contain Disallowed Unicode code points so long as it continues the protocol.
The following individuals have participated in the creation of this specification and are gratefully acknowledged:
[Pouyan Sepehrdad, Qualcomm | Non Member]
[Davide Quarta, Qualcomm | Non Member]
13 February 2018