IEEE 1900.8 Working Group on Semantic Specification of Datasets for RFML-Based Spectrum Awareness
The 1900.8 Working Group is tasked with standardizing the semantic specification of datasets used in training and evaluating Radio Frequency Machine Learning (RFML) models in the context of spectrum situational awareness.

This standard is in active development and we welcome participation from stakeholders across industry, academia, and government who are involved in RF data collection, RFML model development, spectrum management, or toolchain integration. Contributions, use cases, and feedback are encouraged.
Interested individuals may join the working group through the IEEE SA myProject portal or contact the working group chair (Alex Lackpour) for more information.
Purpose
The purpose of this standard is to enable the production and consumption of semantically well-defined RF datasets for use in training, testing, and evaluating machine-learned models. These models are designed to extract information from RF signals and support spectrum awareness tasks. By providing a shared schema and metadata structure, this standard facilitates interoperability among tools, reproducibility of results, and reuse of datasets across organizations. It also lowers the barrier to entry for new contributors by offering multiple levels of conformance, allowing users to adopt the standard incrementally based on their needs and capabilities. The standard supports consistent interpretation of dataset contents regardless of file format, enabling more effective collaboration across the RF machine learning ecosystem.
Need
There is currently no existing standard that defines how datasets used in the training and testing of Radio Frequency Machine Learning (RFML) models should be semantically structured, annotated, and described. While a variety of data storage formats are in use across government, academic, and commercial communities, these formats do not provide a shared schema or standardized vocabulary for describing RF data, metadata, or derived features. The absence of such a semantic framework hinders interoperability, increases development costs, and limits the reusability and comparability of RFML datasets across different organizations and tools.
Openly available RFML datasets often lack consistent metadata for characterizing RF environments, hardware, and signal properties. They also lack standardized structures for relating raw RF data, processed features, and annotated signal events. These limitations reduce the fidelity of model evaluation and make it difficult to share, benchmark, or integrate datasets into existing workflows.
This standard addresses these issues by defining a schema-based approach for the semantic specification of RFML datasets. It enables dataset producers and consumers to apply a common semantic schema without mandating a specific file format. The standard supports multiple levels of conformance to accommodate a broad spectrum of users and use cases while promoting progressive adoption. By improving dataset clarity, structure, and semantic consistency, the standard will facilitate model reproducibility, reduce duplication of effort, and promote wider collaboration and dataset sharing across the RFML community.
Scope
This standard defines a schema-based approach for the semantic specification of datasets used to train and test machine-learned models that process radio frequency (RF) data. The standard specifies a common vocabulary, metadata structure, and hierarchical data model to describe RF signal content, measurement configurations, environmental conditions, and derived features. It does not prescribe a specific file format, but enables dataset producers to structure their data in a format-independent, semantically consistent manner. The standard also defines conformance levels to accommodate varying use cases and implementation complexity, supporting both minimal and comprehensive metadata specifications. It is applicable to datasets used in the development of models that perform one or more RF signal understanding tasks, including detection, classification, characterization, identification, and geolocation.
Stakeholders of interest:
- Machine Learning Engineers and RFML Model Developers: To train and evaluate models more effectively using structured, interoperable datasets with consistent semantics—reducing time spent on data wrangling and preprocessing.
- RF System Researchers and Academic Labs: To share, benchmark, and reproduce RFML experiments using standardized datasets that support rigorous scientific validation.
- SDR Testbed Operators and RF Dataset Generators: To structure and annotate collected RF data using a shared vocabulary that improves long-term usability and integration with downstream machine learning pipelines.
- Spectrum Sharing and Dynamic Spectrum Access Researchers: To develop and validate spectrum awareness models with datasets that reflect realistic RF environments, propagation effects, and interference conditions.
- Defense and National Spectrum Management Agencies: To evaluate RFML-based sensing and characterization solutions using trustworthy, standards-compliant datasets for radar, communications, and EW signals.
- Telecommunications and Equipment Vendors: To streamline the development and testing of RFML-enabled products using standardized data interfaces and reusable training datasets.
Working group procedures
- Policies and Procedures for IEEE 1900.8 Working Group
- Contribution templates
Working group documents
TBD
Contacts
- Alex Lackpour (IEEE 1900.8 WG Chair)
- Jesse Caulfield (IEEE 1900.8 WG Vice Chair)
- Adnan Shahid, PhD (IEEE 1900.8 WG Secretary)