Despite rapid progress in video diffusion models (VDMs), ensuring semantic adherence and physical commonsense remains a fundamental challenge, as even state-of-the-art systems frequently violate real-world physical laws, such as dynamics, thermodynamics, and optics. While Direct Preference Optimization (DPO) has become a popular post-training strategy for aligning generative models, existing video DPO methods to improve physical commonsense suffer from high computational costs, poorly matched preference pairs, and ambiguous global supervision that fails to localize physical violations. We propose PhysPO, a physics-aware local preference optimization framework for physically consistent video generation. We first introduce CounterPhyPipe, a physics-aware counterfactual data construction pipeline that forms video preference pairs with consistent global semantics and local physical violations, enabling meaningful comparisons for preference learning.Then we leverage high-frequency decomposition to introduce a physical mask, which localizes regions where physical state transitions may occur.Finally, we develop PhysPO, a physics-aware DPO framework that concentrates supervision on physically relevant regions while enforcing neutrality on background regions.This mechanism reduces gradient noise and mitigates shortcut optimization, encouraging the model to focus on genuine physical discrepancies rather than superficial cues.Extensive experiments demonstrate that PhysPO significantly improves physical commonsense without compromising semantic adherence.