Introduction

Ubiquitous Mobile Intelligence

Modern society depends on modern transportation, while modern transportation suffers from four major challenges:

  • Safety: transportation accident tops the causes of death for ages 15—29;
  • Mobility: Average US Commuters wasted 97 hours per year in traffic and lost $160 billion due to urban congestion;
  • Environment: Transportation accounts for 29%  of greenhouse gas emissions US, the largest source;
  • Space: Today’s cars sit unused 95% of the time, demanding a large amount of parking spaces. 

4th Industrial Revolution (AI and Cyber-Physical Systems) leads to a paradigm shift to intelligence-centric mobility.

VehiclesAutonomous and connected, with perception, reasoning, and decision-making capabilities.
InfrastructureSmart roads, adaptive traffic lights, and predictive maintenance.
Logistics & Supply ChainEnd-to-end optimization via AI-powered routing, inventory forecasting, and multimodal orchestration.
Urban MobilityIntegration of public transit, ride-sharing, and micro-mobility through AI-driven platforms (“Mobility as a Service”).
Policy & SustainabilityData-driven transport policy, carbon optimization, and AI-assisted urban planning.

When, where, and how to apply Large Language Model (LLM) for autonomous driving?

Hierarchical decision-making in autonomous driving:

Long-term / StrategicRoute and mission planningSeconds to minutesPath through city, highway exit, detour, mission-level reasoning
Mid-term / TacticalBehavioral planning100–500 msLane changes, overtaking, merging, stop/yield behavior
Short-term / ReactiveControl and safety loops10–50 msObstacle avoidance, braking, steering control, emergency actions

LLM for high-level and mid-level planning: reasoning, interpretation, and multimodal context understanding

VLA: Vision-Language-Action Model

Advantages:
•Unified multimodal perception and reasoning
•Natural-language reasoning, explainability, and adaptive learning
•Generalization and flexible task adaptation
•Hierarchical and interpretable action planning
•Broad applicability to other autonomous systems
Challenges:
•Training requires enormous GPU/TPU clusters
•Inference is also demanding: difficulty for real-time deployment on embedded or safety-critical systems like vehicles
•No single standard metric for measuring how well a VLA model understands and acts

Central vs. Distributed Training?

Central training
•Pros:
– Abundant computing resources
– Aggregated large data set
•Cons:
– Data privacy and security
– Single point of failure, not scalable, not personalized
– Communication overhead and delay
End training
•Cons:
– Limited computing resources
– Limited data set
•Pros:
– Data privacy and security
– Personalized model
– No communication overhead

Federated learning for VLA [1]

Advantages:
•Distributed intelligence
– Diversity improve generalization
– Scalability: resources in end, edge, and cloud
•Privacy preserving
•Reducing communication
•Supporting continuous, on-device learning and personalization
•Fast reactions to real-time, location-relevant needs
•……
Challenges:
•Multi-modal sensing information encoding
•Heterogeneous vehicles with different resource constraint
– Model distillation, fine tuning, and personalization
•Communication bottleneck in training and inferencing
•Delay, delay, delay
•Safety, security, accountability
•……

[1] Tianao Xiang, Mingjian Zhi, Yuanguo Bi, Lin Cai, Yuhao Chen, FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks, https://arxiv.org/abs/2511.09025