Robust binary and multinomial logit models for classification with data uncertainties,European Journal of Operational Research

当前位置： X-MOL 学术 › Eur. J. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust binary and multinomial logit models for classification with data uncertainties
European Journal of Operational Research ( IF 6.0 ) Pub Date : 2025-05-22 , DOI: 10.1016/j.ejor.2025.05.013
Baichuan Mo, Yunhan Zheng, Xiaotong Guo, Ruoyun Ma, Jinhua Zhao

Binary logit (BNL) and multinomial logit (MNL) models are the two most widely used discrete choice models for travel behavior modeling and prediction. However, in many scenarios, the collected data for those models are subject to measurement errors. Previous studies on measurement errors mostly focus on “better estimating model parameters” with training data. In this study, we focus on using BNL and MNL for classification problems, that is, to “better predict the behavior of new samples” when measurement errors occur in testing data. To this end, we propose a robust BNL and MNL framework that is able to account for data uncertainties in both features and labels. The models are based on robust optimization theory that minimizes the worst-case loss over a set of uncertainty data scenarios. Specifically, for feature uncertainties, we assume that the ℓp-norm of the measurement errors in features is smaller than a pre-established threshold. We model label uncertainties by limiting the number of mislabeled choices to at most Γ. Based on these assumptions, we derive a tractable robust counterpart. The derived robust-feature BNL and the robust-label MNL models are exact. However, the formulation for the robust-feature MNL model is an approximation of the exact robust optimization problem. An upper bound of the approximation gap is provided. We prove that the robust estimators are inconsistent but with a higher trace of the Fisher information matrix. They are preferred when out-of-sample data has errors due to the shrunk scale of the estimated parameters. The proposed models are validated in a binary choice data set and a multinomial choice data set, respectively. Results show that the robust models (both features and labels) can outperform the conventional BNL and MNL models in prediction accuracy and log-likelihood. We show that the robustness works like “regularization” and thus has better generalizability.

中文翻译：

用于数据不确定性分类的稳健二进制和多项式 Logit 模型

二进制 logit （BNL）和多项式 logit （MNL）模型是出行行为建模和预测中使用最广泛的两种离散选择模型。但是，在许多情况下，为这些模型收集的数据会受到测量误差的影响。以前关于测量误差的研究主要集中在使用训练数据 “更好地估计模型参数” 上。在这项研究中，我们专注于使用 BNL 和 MNL 来解决分类问题，即当测试数据中出现测量错误时，“更好地预测新样本的行为”。为此，我们提出了一个强大的 BNL 和 MNL 框架，该框架能够解释特征和标签中的数据不确定性。这些模型基于稳健的优化理论，该理论可最大限度地减少一组不确定性数据情景中的最坏情况损失。具体来说，对于特征不确定性，我们假设特征中测量误差的 lp-norm 小于预先建立的阈值。我们通过将错误标记的选择数量限制为最多 Γ 来对标签不确定性进行建模。基于这些假设，我们推导出了一个易于处理的稳健对应物。推导的稳健特征 BNL 和稳健标签 MNL 模型是精确的。然而，稳健特征 MNL 模型的公式是精确稳健优化问题的近似值。提供了近似间隙的上限。我们证明了稳健估计量不一致，但具有更高的 Fisher 信息矩阵痕迹。当样本外数据由于估计参数的尺度缩小而存在错误时，它们是首选参数。所提出的模型分别在二元选择数据集和多项选择数据集中进行了验证。结果表明，稳健模型（特征和标签）在预测准确性和对数似然方面优于传统的 BNL 和 MNL 模型。我们表明，稳健性的工作方式类似于 “正则化”，因此具有更好的泛化性。

更新日期：2025-05-22

点击分享查看原文

点击收藏

阅读更多本刊新发论文