Machine learning (ML) offers great potential for expanding the applied economist’s toolbox. Recent overview papers have pointed to the potential for big data and ML to improve farm management (Raj et al., 2015; Shekhar et al., 2017; Coble et al., 2018; Kamilaris and Prenafeta-Boldú, 2018) and economic analysis more broadly (Einav and Levin, 2014; Varian, 2014; Bajari et al., 2015; Grimmer, 2015; Monroe et al., 2015; Athey and Imbens, 2016). ML tools are beginning to be employed in economic analysis (März et al., 2016; Crane-Droesch, 2017; Athey, 2019), while some researchers raise concerns about their transparency, interpretability and use for identifying causal relationships (Lazer et al., 2014). In this review paper, we introduce ML to applied economists by placing it in the context of standard econometric and simulation methods. We identify shortcomings of current methods used in agricultural and applied economics, and discuss both the opportunities and challenges afforded by ML to supplement our existing approaches.
What is ML? The terms ML, artificial intelligence (AI) and deep learning (DL) are often used interchangeably. ML is part of artificial intelligence, which in turn is a discipline in computer science. ML aims to learn from data using statistical methods. DL is a specific subset of ML that uses a hierarchical approach, where each step converts information from the previous step into more complex representations of the data (Goodfellow et al., 2016). Many of the newest advances in machine learning are in the area of DL (LeCun, Bengio and Hinton, 2015).
Why introduce ML to agricultural and applied economics now? First, data availability has dramatically increased in many different areas, including agriculture, environment and development (Shekhar et al., 2017; Coble et al., 2018). Along with helping process data from these novel sources, ML methods are well equipped to exploit large volumes of data more efficiently than traditional statistical methods. Second, since the early 2000s, the use of multi-processor graphic cards (graphic processing unit, or GPU) has greatly sped up computer learning (Schmidhuber, 2015), and many ML methods can be parallelised and exploit the potential of GPUs. Third, the ML/DL research community from both academia and industry is rapidly developing the tools users need to apply these methods. Researchers have developed and improved algorithms that push the boundaries of ML/DL (Schmidhuber, 2015). The community has a strong open source tradition, including powerful DL libraries (e.g. tensorflow.org, pytorch.org) and pretrained models (e.g. VVGNet, ResNet), increasing the potential for adoption. Last but not least, economists have begun to realise that the predictive power of ML methods may not only be used as such, but can also improve causal identification (Athey, 2019).
How can ML be helpful for agricultural and applied economics? Our models often contain little prior information about functional form, have large potential heterogeneity across units of observation and frequently have multiple outputs. For example, imagine one wants to estimate the effect of a fertiliser subsidy on the yield of crops. Yield is determined by a complex combination of soil quality, weather, inputs, input timing and other management choices, replete with non-linearities and interactions. Or suppose one wants to ask how subsidies affect farm structure, where both policy and structure may be complex and multidimensional. In demand system estimation, one might have access to daily, product-level scanner data or data on housing sales to estimate preferences for local amenities, or one may want to estimate the effect of pollution on multiple measures of health. While our traditional methods have allowed us to approach these questions, ML increases the flexibility with respect to both data and functional form, as well as processing efficiency, opening up other avenues for analysis.
Often ML approaches are perceived as something special or even mysterious, potentially due to associated terms such as AI, neural networks (NN) or DL, that create associations with human intelligence. As we lay out in Section 2, ML tools are ‘just’ statistical tools and in many ways are a natural extensions of the econometric toolbox. In the next section we introduce central ML approaches, not aiming for textbook coverage, but rather to present them from an applied econometric perspective highlighting similarities and differences with our traditional methods. One distinction is that ML focuses primarily on the predictive accuracy and forecast errors while econometricians focus on deriving statistical properties of estimators for hypothesis testing (see also Mullainathan and Spiess, 2017). In Section 2.1, we present the ML approach to predictive accuracy and to control for overfitting. We also present central supervised learning approaches for regression tasks (Section 2.2) and unsupervised learning approaches for dimensionality reduction (Section 2.3). Often there are concerns about ML models being a ‘black box’ and we reflect on the tradeoff between model complexity versus interpretability in Section 2.4, including tools to help interpret ML models.
Section 3 then takes a closer look at limitations of our current set of econometric tools and simulation methods, and explores to what extent ML approaches can overcome them. We frame this section in terms of current challenges faced in applied economic analysis; while there may be some overlap in the ML solutions, the problems being addressed are different. Functional forms employed in econometric analysis often lack theoretical grounding and are not sufficiently flexible to capture the multiple interactions, non-linearities and heterogeneity so common to biological or social processes in agricultural and environmental systems. ML tools allow for highly flexible estimation, address model uncertainty and efficiently deal with large sample sizes (Section 3.1). Our current methods limit the full use of novel unstructured data sources, such as remote sensing images, cellular phone records or text from news and social media. ML approaches may reduce the reliance on limited ‘hand-crafted’ features to make better use of the available data (Section 3.2). Similarly, ML offers opportunities in situations in which we have a very large number of potential explanatory variables or observe explanatory variables at high temporal or spatial resolution for which our current approaches to aggregate data into a standard panel form implies loss of information (Section 3.3). One common objection from economists is that ML tools are of only of limited use as they focus on prediction while economists are primarily interested in answering causal questions. While it is true that ML tools are primarily developed for prediction, there are recent contributions, particularly from economists, that exploit the prediction capabilities of ML tools for causal inference. We provide an overview of these approaches and how they can help to overcome limitations of the current tools for causal inference (Section 3.4). Beyond enhancing econometric methods, ML can help alleviate current constraints of simulation models. Partial or general equilibrium models or Agent Based Models (ABMs) are often computationally limited in their degree of complexity. Further, empirical calibration of equilibrium models or ABMs is challenging. ML methods are beginning to be employed to overcome these computational limitations and to improve calibration (Section 3.5). In Section 4, we discuss potential limitations of ML approaches and what economists can add to overcome these limitations. Finally, we identify some relevant frontier developments in ML for economic analysis (Section 5).
While some of the issues reviewed in this paper have been raised in the general economics literature, and several authors have already highlighted the potential of ‘big data’ for agricultural economics, no overview on the existing and potential applications of ML methods for agricultural and applied economics analysis yet exists. We believe these methods hold particular promise for researchers in our field due to the frequent linkages with complex biological or physical processes, uses of non-traditional data sources such as those derived from remote sensing and the frequent use of simulation methods. While, like other reviews, we briefly introduce ML methods, we do so from the perspective of our standard econometric and simulation tools to aid understanding and appropriate application. Unlike earlier reviews, we highlight how ML tools can fill gaps in our existing methodological tool box, focusing on what long standing challenges they can solve. We place particular emphasis on NNs because despite holding significant potential for capturing complex spatial and temporal relationships, they are still not greatly used in economic analysis. Further, we review the application of ML tools in policy simulation, which, to our knowledge has not yet been extensively covered. We hope that relating ML methods to our current approaches and their shortcomings will allow this paper to serve as a guide for applied economists interested in expanding their methodological toolbox.
Source: University of Bonn: Innovation and Technology for Sustainable Futures, Hugo Storm, Kathy Baylis, Thomas Heckelei, Machine learning in agricultural and applied economics, European Review of Agricultural Economics, 21.09.2019