IEEE 1900.8 Working Group on Semantic Specification of Datasets for RFML-Based Spectrum Awareness
The IEEE 1900.8 Working Group is currently developing a draft standard for the semantic specification of datasets used in training and evaluating Radio Frequency Machine Learning (RFML) models.
This standard is in active development and we welcome participation from stakeholders across industry, academia, and government who are involved in RF data collection, RFML model development, spectrum management, or toolchain integration. Contributions, use cases, and feedback are encouraged.
Interested individuals may join the working group through the IEEE SA myProject portal or contact the working group chair for more information.
Purpose
The purpose of this standard is to enable the production and consumption of semantically well-defined RF datasets for use in training, testing, and evaluating machine-learned models. These models are designed to extract information from RF signals and support spectrum awareness tasks. By providing a shared schema and metadata structure, this standard facilitates interoperability among tools, reproducibility of results, and reuse of datasets across organizations. It also lowers the barrier to entry for new contributors by offering multiple levels of conformance, allowing users to adopt the standard incrementally based on their needs and capabilities. The standard supports consistent interpretation of dataset contents regardless of file format, enabling more effective collaboration across the RF machine learning ecosystem.
Need
There is currently no existing standard that defines how datasets used in the training and testing of Radio Frequency Machine Learning (RFML) models should be semantically structured, annotated, and described. While a variety of data storage formats are in use across government, academic, and commercial communities, these formats do not provide a shared schema or standardized vocabulary for describing RF data, metadata, or derived features. The absence of such a semantic framework hinders interoperability, increases development costs, and limits the reusability and comparability of RFML datasets across different organizations and tools.
Openly available RFML datasets often lack consistent metadata for characterizing RF environments, hardware, and signal properties. They also lack standardized structures for relating raw RF data, processed features, and annotated signal events. These limitations reduce the fidelity of model evaluation and make it difficult to share, benchmark, or integrate datasets into existing workflows.
This standard addresses these issues by defining a schema-based approach for the semantic specification of RFML datasets. It enables dataset producers and consumers to apply a common semantic schema without mandating a specific file format. The standard supports multiple levels of conformance to accommodate a broad spectrum of users and use cases while promoting progressive adoption. By improving dataset clarity, structure, and semantic consistency, the standard will facilitate model reproducibility, reduce duplication of effort, and promote wider collaboration and dataset sharing across the RFML community.
Scope
This standard defines a schema-based approach for the semantic specification of datasets used to train and test machine-learned models that process radio frequency (RF) data. The standard specifies a common vocabulary, metadata structure, and hierarchical data model to describe RF signal content, measurement configurations, environmental conditions, and derived features. It does not prescribe a specific file format, but enables dataset producers to structure their data in a format-independent, semantically consistent manner. The standard also defines conformance levels to accommodate varying use cases and implementation complexity, supporting both minimal and comprehensive metadata specifications. It is applicable to datasets used in the development of models that perform one or more RF signal understanding tasks, including detection, classification, characterization, identification, and geolocation.
Working group procedures
- Policies and Procedures for IEEE 1900.8 Working Group
- Contribution templates
Working group documents
TBD
Contacts
- Alex Lackpour (IEEE 1900.8 WG Chair)
- Jesse Caulfield (IEEE 1900.8 WG Vice Chair)
- Adnan Shahid, PhD (IEEE 1900.8 WG Secretary)
