备注
AI Translation Notice
This document was automatically translated by hunyuan-turbos-latest model, for reference only.
Source document: kernel/net/napi_and_netns_poll.md
Translation time: 2026-01-07 14:52:23
Translation model:
hunyuan-turbos-latest
Please report issues via Community Channel
Design Documentation of NAPI and NetNamespace Polling Mechanism in DragonOS
This document explains the design and implementation of the NAPI (New API) mechanism and NetNamespace polling scheduling mechanism in the current network subsystem of DragonOS.
1. Mechanism Overview
DragonOS adopts a “Event-Driven + Precise Timing” hybrid driving model for network packet processing. The system uses separate kernel threads to handle “packet reception tasks triggered by hardware interrupts” and “protocol stack timing tasks,” achieving efficient response to network traffic and rational scheduling of CPU resources.
The core design consists of two main parts:
NAPI Subsystem: Responsible for high-throughput network packet reception and transmission processing, using a “Bounded Polling” mechanism.
NetNamespace Scheduler: Responsible for managing the protocol stack’s time events (such as TCP retransmission), acting as a precise timer.
2. Core Component Design
2.1 NAPI Subsystem (kernel/src/driver/net/napi.rs)
NAPI is the core packet reception mechanism in the DragonOS network driver layer.
NapiStruct: Each network interface card (NIC) that supports NAPI (
Iface) is bound to anNapiStructstructure. It maintains the state of the NAPI instance (such as theSCHEDscheduling bit) and weight (weight).Weight (Weight): Defines the maximum number of data packets allowed to be processed within a single scheduling cycle (Budget) for this interface, preventing a single NIC from monopolizing the CPU.
Global NAPI Manager (Global NapiManager):
Currently implemented as a singleton (
GLOBAL_NAPI_MANAGER).Maintains a global
napi_listpending queue.Provides the
napi_schedule()interface: called by the NIC interrupt handling function to add the NAPI instance to the queue and wake up the processing thread.
NAPI Processing Thread (
napi_handler):This is a dedicated kernel thread, initialized during system startup.
Working Logic: Continuously retrieves scheduled NAPI instances from
napi_list, and calls theirpoll()method.Cyclic Scheduling: If
poll()returnstrue(indicating the Budget is exhausted but there are still packets), the thread will put the instance back at the end of the queue, waiting for the next round of scheduling.
2.2 NetNamespace Scheduler (kernel/src/process/namespace/net_namespace.rs)
Each network namespace (NetNamespace) has an independent polling thread (netns_poll), which in the current design mainly serves as a “Timer” and “Fallback Scheduler”.
Precise Timing: This thread maintains the
poll_at_us(next time point that needs to be processed) of all NICs within the namespace. It calculates the nearest deadline (Deadline) and performs precise sleep (wait_event_timeout).Timeout Trigger: When the sleep times out (i.e., the protocol stack timing event arrives, such as TCP RTO), this thread does not directly process the packets but calls
napi_schedule(), distributing the tasks to the NAPI thread for execution. This ensures that the heavy protocol stack processing logic is uniformly handled by the NAPI thread.
2.3 Bounded Polling (Bounded Polling)
The polling interface poll_napi(budget) adapted for NAPI is implemented in kernel/src/driver/net/mod.rs.
Logic:
Calls the
poll_ingress_singleofsmoltcpto process the receive queue, with the number of loops limited bybudget.Executes a
poll_egressto advance the send queue.Updates the
poll_at_ustimestamp for reference by the NetNamespace scheduler.
Features: Ensures that the execution time of each scheduling is controllable, avoiding long interrupt closures or thread starvation.
3. Workflow Illustration
The current system’s network processing data flow and control flow are as follows:
flowchart TD
%% 硬件层与驱动层
subgraph Hardware_Driver [硬件与驱动层]
NIC[物理网卡]
IRQ_Handler[中断处理函数]
NIC -->|数据包到达| IRQ_Handler
end
%% NAPI 子系统
subgraph NAPI_System [NAPI 子系统]
NapiManager[NapiManager 全局队列]
NapiThread[NapiHandler 线程]
IRQ_Handler -->|1. napi_schedule| NapiManager
NapiManager -->|2. 唤醒| NapiThread
NapiThread -->|3. 取出 NapiStruct| NapiThread
end
%% 协议栈与定时器
subgraph Network_Stack [协议栈与定时器]
Smoltcp[smoltcp 协议栈]
NetnsThread[Netns 调度线程]
NapiThread -->|4. poll_napi (budget)| Smoltcp
Smoltcp -->|5. 处理 Ingress/Egress| Smoltcp
Smoltcp -->|6. 更新 poll_at_us| NetnsThread
NetnsThread -.->|7. 休眠等待 poll_at| NetnsThread
NetnsThread -->|8. 超时触发 napi_schedule| NapiManager
end
%% 循环反馈
Smoltcp -.->|9. 数据未处理完 (Budget耗尽)| NapiManager
4. Current Implementation Status
As of the current version, the network mechanism of DragonOS has the following implementation characteristics:
Single-Queue NAPI Management:
The current
NapiManageris globally unique, and all CPUs share the same pending queue.The Linux-style Per-CPU NAPI queue has not been implemented yet (TODO is left in the code).
Thread Model:
napi_handler: Responsible for specific packet processing and protocol stack advancement, is a compute-intensive thread.netns_poll: Responsible for time management and event distribution, is an IO/Sleep-intensive thread.
Driver Support:
Drivers such as Virtio-Net, Veth, and Loopback have been integrated into this mechanism, actively calling
napi_scheduleduring interrupts or packet sending.
Smoltcp Adaptation:
Through
IfaceCommon,smoltcpis encapsulated, converting the unboundedpoll()into a boundedpoll_napi()adapted for NAPI.