Assessing different cross-validation schemes for predicting novel traits using sensor data: An application to dry matter intake and residual feed intake using milk spectral data

Yilmaz Adkinson, ASİYE; Abouhawwash, M.; VandeHaar, M.J.; Parker Gaddis, K.L.; Burchard, J.; Peñagaricano, F.; White, H.M.; Weigel, K.A.; Baldwin, R.; Santos, J.E.P.; Koltes, J.E.; Tempelman, R.J.

doi:10.3168/jds.2024-24701

Assessing different cross-validation schemes for predicting novel traits using sensor data: An application to dry matter intake and residual feed intake using milk spectral data

Yilmaz Adkinson A., Abouhawwash M., VandeHaar M., Parker Gaddis K., Burchard J., Peñagaricano F., ...Daha Fazla

Journal of Dairy Science, cilt.107, sa.10, ss.8084-8099, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 107 Sayı: 10
Basım Tarihi: 2024
Doi Numarası: 10.3168/jds.2024-24701
Dergi Adı: Journal of Dairy Science
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Periodicals Index Online, Agricultural & Environmental Science Database, Analytical Abstracts, BIOSIS, Business Source Elite, Business Source Premier, CAB Abstracts, Chemical Abstracts Core, Environment Index, Food Science & Technology Abstracts, Veterinary Science Database, Directory of Open Access Journals, DIALNET
Sayfa Sayıları: ss.8084-8099
Anahtar Kelimeler: mid-infrared spectrum, dry matter intake, cross-validation
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Erciyes Üniversitesi Adresli: Evet

Özet

Feed efficiency is important for economic profitability of dairy farms; however, recording daily DMI is expensive. Our objective was to investigate the potential use of milk mid-infrared (MIR) spectral data to predict proxy phenotypes for DMI based on different cross-validation schemes. We were specifically interested in comparisons between a model that included only MIR data (model M1); a model that incorporated different energy sink predictors, such as body weight, body weight change, and milk energy (model M2); and an extended model that incorporated both energy sinks and MIR data (model M3). Models M2 and M3 also included various cow-level variables (stage of lactation, age at calving, parity) such that any improvement in model performance from M2 to M3, whether through a smaller root mean squared error (RMSE) or a greater squared predictive correlation (R2), could indicate a potential benefit of MIR to predict residual feed intake. The data used in our study originated from a multi-institutional project on the genetics of feed efficiency in US Holsteins. Analyses were conducted on 2 different trait definitions based on different period lengths: averaged across weeks versus averaged across 28 d. Specifically, there were 19,942 weekly records on 1,812 cows across 46 experiments or cohorts and 3,724 28-d records on 1,700 cows across 43 different cohorts. The cross-validation analyses involved 3 different k-fold schemes. First, a 10-fold cow-independent cross-validation was conducted whereby all records from any one cow were kept together in either training or test sets. Similarly, a 10-fold experiment-independent cross-validation kept entire experiments together, whereas a 4-fold herd-independent cross-validation kept entire herds together in either training or test sets. Based on cow-independent cross-validation for both weekly and 28-d DMI, adding MIR predictors to energy sinks (model M3 vs. M2) significantly (P < 10−10) reduced average RMSE to 1.59 kg and increased average R2 to 0.89. However, adding MIR to energy sinks (M3) to predict DMI either within an experiment-independent or herd-independent cross-validation scheme seemed to demonstrate no merit (P > 0.05) compared with an energy sink model (M2) for either R2 or RMSE (respectively, 0.68 and 2.55 kg for M2 in herd-independent scheme). We further noted that with broader cross-validation schemes (i.e., from cow-independent to experiment-independent to herd-independent schemes), the mean and slope bias increased. Given that proxy DMI phenotypes for cows would need to be almost entirely generated in herds having no DMI or training data of their own, herd-independent cross-validation assessments of predictive performance should be emphasized. Hence, more research on predictive algorithms suitable for broader cross-validation schemes and a more earnest effort on calibration of spectrophotometers against each other should be considered.