A Toolkit to Strengthen Government Budget Surveillance

In this paper we develop a comprehensive short-term fiscal forecasting system of use for the real-time monitoring of the Spanish government’s borrowing requirement. Spain has been at the centre of the recent European sovereign debt crisis, not least because of sizeable failures in meeting public deficit targets. The system comprises a suite of models, with different levels of disaggregation (bottom-up vs top-down; general government vs sub-sectors), which are suitable for the automatic processing of the large amount of monthly/quarterly fiscal data currently published by the Spanish statistical authorities. Our tools are instrumental in the ex-ante detection of risks to official projections, and can thus help reduce the ex-post reputational costs of budgetary slippage. On the basis of our results, we discuss how official monitoring bodies could expand, on one hand, their toolkit to evaluate regular adherence to targets (moving beyond a legalistic approach) and, on the other, their communication policies as regards sources of risks to (ex-ante) compliance with budgetary targets.

The Working Paper Series seeks to disseminate original research in economics and fi nance. All papers have been anonymously refereed. By publishing these papers, the Banco de España aims to contribute to economic analysis and, in particular, to knowledge of the Spanish economy and its international environment.
The opinions and analyses in the Working Paper Series are the responsibility of the authors and, therefore, do not necessarily coincide with those of the Banco de España or the Eurosystem.
The Banco de España disseminates its main reports and most of its publications via the Internet at the following website: http://www.bde.es.

Introduction
Government accountability is an essential principle of democracy through which elected and non elected officials are obliged to explain to the parliament and the public in general their decisions, actions and the consequences of these decisions and actions (see e.g., IMF, 2012; Hameed, 2005;Leal et al., 2008). Most developed countries do have in place a framework of political, legal, and administrative mechanisms designed to control government bodies and officials. Typically, these controls focus on the adherence of the designed policies to the extant legal framework, and are not designed to influence the ex-ante design of policies nor the ex-post responsibility of policy-makers as regards the performance of their actions.
Planning errors can have enormous influence on the economy, in particular those related to budgetary policies. For example systematic and/or sizeable budgetary forecast errors may spur waves of lack of confidence on current governmental policies, and as a consequence affect the economy as a whole via e.g. by tightening the constraints on financing channels for firms. In addition, above-fundamentals financing of the public debt imposes a burden on future generations of taxpayers. Indeed, it can be claimed that the reputational costs associated to lack of adherence of budgetary outcomes to ex-ante budgetary targets were among the group of fundamental drivers behind the recent sovereign debt crisis in the particular case of Spain, within the broader euro-area crisis.
As a consequence, a significant change in the fiscal governance framework took place since the end of 2011 in Spain, whereby an enhanced framework of national budgetary surveillance entered into force as of mid-2012, enshrined in high-ranked legal documents (Constitution and Royal Organic Law), including a huge leap forward in the availability of fiscal statistics and the procedures governing all stages of budgetary planning, including the design and the implementation phases. By now the link between the quality of fiscal frameworks and budgetary discipline is a wellproven fact from an international perspective (see, e.g. von Hagen, 2010). In the case of Spain, though, the new budgetary surveillance framework still has to prove its usefulness to control the behavior of policy makers, as regards budgetary matters, in particular in the face of upcoming electoral periods. But even assuming that policy makers were to have the will to fully implement all the legal procedures in place and were to publish timely and non-controversial real-time fiscal data, two potential weaknesses remain.
including private and public analysts. On the contrary, it has become increasingly difficult for private and public analysts alike to follow and interpret the continuous flow of monthly fiscal data that is currently published by official statistical agencies in Spain. Indeed, just to quote one example, in March 2012 the only monthly publication about government's budgetary execution did refer to the central government, while since the beginning of 2013 the Spanish statistical authorities do disseminate monthly information following national accounts' definitions for the central, regional and social security sectors, including individual regional governments. This is a level of dissemination of data on public accounts that has no parallel nowadays in Europe. Nevertheless, at the same time, significant and not clearly explained revisions of headline, annual past fiscal data occurred in 2012 and 2013, spurring doubts in private investors and analysts. Just to quote one example, in a Reuters' press news published as recently as November 2013, one could read: "Spain's erratic reporting of fiscal figures, especially from its regional authorities, and repeated revisions to data have fuelled investor mistrust in the government's effort to reduce one of the euro zone's largest public deficits". 1 Second, the budgetary control and monitoring procedures and institutions currently in place in Spain, 2 pose too much weight on the ex-post adherence of policy outcomes to certain legal and administrative clauses, more than to the ex-ante design and reporting of policy actions, and the real-time monitoring of budgetary execution. This is the standard approach in continental Europe, but it remains to be seen if this approach would be able to detect in a timely fashion fiscal slippage, and to send early-warning signals that could help develop timely corrective actions.
It is against this framework that in this paper we propose a broad set of models and tools suitable for real-time monitoring of fiscal plans, including the assessment of the probability of meeting fiscal targets, that allow for a quick and efficient processing of a vast amount of incoming monthly, quarterly and annual information pertaining to most revenue and expenditure categories, and for all sub-sectors of the General Government. The models are time-series, mixed-frequencies models along the lines of Harvey and Chung (2000), Moauro and Savio (2005), Proietti and Moauro (2006), and Pedregal and Pérez (2010). 3 These papers use a temporal aggregation method that First, more information does not necessarily mean better understanding and trust from users, Authority" that should help in the monitoring process of budgetary plans.
3 Other approaches for modeling data at different sampling intervals are the methods based on regression techniques relies on the information contained on related indicators observed at the desired higher frequency.
The statistical treatment of structural time series models is based on the state space form and the Kalman Filter. In our case this approach allows the estimation of monthly and quarterly models using annual, quarterly and monthly observations, and permit changes over time arising from an increase in sample size. The State Space framework used allows us to build constrained forecasts in order to evaluate how feasible a given annual target might result. By setting future values of the relevant variable to the actual future policy target, we force a given model to converge to that very target, regardless of how improbable such target might be. But the interesting part of this analysis is that the model replicates a path of indicators compatible with the targets and all the information available at each moment in time. In other words, the model shows a limiting monthly/quarterly path for the indicators, necessary to meet the policy targets. We also proceed one step further and adapt the approach of Gómez and Guerrero (2006) in order to test whether official targets are compatible with the natural unconditional true forecast of a given statistical model. Such a test would produce evidence on the chances of meeting the target.
It is important to stress upfront in the paper that we see the usefulness of our models as a benchmark for the interpretation of newly available data, and not as a substitute of the in-depth analysis normally carried out by fiscal experts in policy institutions. A detailed knowledge of institutional and special factors is a key ingredient for the short-term analysis of fiscal data, which could be further exploited in conjunction with the toolkit presented in this paper. 4 Nonetheless, being aware that budget planning and implementation is more an art than a science (as claimed, for example, by Leal et al., 2008), we are at the same time convinced that looking at short-term fiscal data (i.e. data on the actual implementation of fiscal plans) through the lens of the kind of tools and models we put forward in our paper could provide a neutral and transparent assessment of adherence of observed budgetary data against the monthly/quarterly path consistent with the achievement of annual fiscal targets. On the basis of our results, we discuss how official monitoring bodies could expand, on the one hand, their toolkit to evaluate regular adherence to targets (moving beyond a legalistic approach) and, on the other, their communication policies as regards sources of risks of (ex-ante) compliance with budgetary targets, and the convenience to launch, when needed, ex-ante corrective actions. (Chow andLin, 1971, Guerrero, 2003), the MIDAS (MIxed DAta Sampling) approach (see Ghysels, Santa-Clara &Valkanov, 2004, Clements andGalvão, 2007), the state space approaches of Liu and Hall (2001) and Mariano and Murusawa (2003), or the ARMA model model with missing observations of Hyung and Granger (2008). 4 Along the same lines see also the discussion of Leal, Pedregal and Pérez (2010).
The rest of the paper is organized as follows. In Section 2 we briefly review the related literature, stressing the contributions of the current paper. The system presented in our paper could serve as a side tool within the monitoring steps prescribed by current national fiscal rules in Spain. To make this clear, in Section 3 we describe a number of institutional issues, namely the territorial organization of Spain, one of the most fiscally decentralized countries in Europe, and the extant framework of national fiscal rules. We also provide a description of ex-ante targets set in real-time by Governments over the period 2008-2013, as well as its adherence to ex-post, published figures, to make explicit the need to incorporate further tools in the national surveillance process. Then, in Section 4 we turn to the description of the data available and the publication lags of official information, to move next in Section 5 to the discussion of the methodological vagaries of our models, and present formally their potential uses for actual policy-makers. Finally, in Section 6 we show some counterfactual, empirical results to somehow justify the validity of our approach, in particular by means of a truly real-time example on the case of the fiscal year 2011, a year in which the ex-post public deficit outcome turned out to exceed hugely ex-ante policy targets, taking by surprise not only international organizations but also private sector analysts. Finally, in Section 7 we provide the main conclusions of the paper. For the case of Spain, some recent contributions to the latter literature are Fernández-Caballero, Pérez and Pedregal (2012), that look at sub-national governments' budgetary data, 6 , Leal and Pérez (2009;2005), that focus on the central government sector, 7 and Leal, Pérez and Pedregal (2010). The latter paper constructs multivariate, state-space mixed-frequencies models for the main components of the Spanish General Government sector made up of blocks for each one of its sub-sectors.
The contribution of the our paper can be seen, from the methodological point of view, as a step forward from Leal, Pérez and Pedregal (2010). In our paper we explore a much broader set of models, confronting their quite aggregated models with a number of single-variable models, thus enabling a clean comparison of bottom-up versus top-down approaches to the problem at hand: monitoring and forecasting general government borrowing requirements. A second contribution of our paper is that we cover the whole euro area period, 1999Q1 till 2012Q4, i.e. in particular we cover all crisis years up to the last fiscal year for which data is available as of the cut-off date of information for our paper. Thirdly, we integrate in our models the substantial amount of statistical data that has been made available by the Spanish statistical authorities since the end of 2010. Finally, and quite notably, our paper goes beyond previous only research-oriented academic contributions and provides, in addition to the research contribution, a fully-implementable toolbox usable for real-time monitoring of public finances. As regards the latter, a database of monthly and quarterly fiscal indicators fully updated every week is provided with the paper, 8 as well as a fully-documented MATLAB toolbox that uploads the data, runs all the models and provides are prepared in annual terms, given an annual budgetary cycle, and the discretionary nature of many government measures set up for the entire year, have traditionally limited the role of high-frequency fiscal data for monitoring annual budgetary targets in the course of the year. 6 They make three main contributions. First, they compile a dataset on quarterly and monthly sub-national governments' spending variables, and indicators, by reviewing all available, scattered sources, and put together a database usable for economic analysis. Second, they exploit the compiled information by fitting time-series, mixedfrequencies models to the data, and show the forecasting and monitoring capabilities of the selected short-term spending indicators. Third, they show that official annual budgetary targets presented useful guidance as to the actual course of sub-national fiscal spending, in particular when combined with short-term indicators-based forecasts. Leal and Pérez (2005) assess adherence to announced budgetary targets of a set of fiscal revenue data for the Autonomous Community of Andalusia, by employing the methodology of Kanda (2002) to evaluate the likelihood of meeting annual fiscal revenue targets, given partial-year monthly data. standardized output files, also available from the authors upon request. All in all, some 20 different models are run over a dataset including more than 200 fiscal variables, and this is done automatically in a few hours of computing time.

Some institutional and policy issues
A number of institutional matters have to be discussed before jumping to the econometric methodology. This is so, because, first, Spain is a quite fiscally decentralized country, but has been so in successive waves of fiscal decentralization since the early 1980s. This devolution process is also

The territorial structure
Spain is currently one of the most decentralized countries in the European Union. As an example, in 2010 close to 50% of general government expenditure was carried out by sub-national governments (see, e.g. Hernández de Cos and Pérez, 2013a). This is the result of a gradual transfer of responsibilities for the management of specific services from the Central Government to the Regional Governments since the beginning of the 1980s. The transfer of expenditure responsibilities from the Central Government to the regions has, however, neither come about at the same pace, nor have they been on the same scale for all of them. The main differences concern the time at which the various regions took over education and health competencies. In parallel to this process of devolution of expenditure responsibilities to the regions, a financing system for the sub-national governments was also progressively developed. Again, the process was not completely homogeneous across regions, and changed more or less every five years. In particular, the last reform of the financing agreements was approved at the end of 2009. The new system raised the amount of taxes transferred (to 50% in the case of the personal income tax and VAT; to 58% in the case of excise duties on manufactured production of alcohol, tobacco and hydrocarbons) 9 and Regions received additional powers to modify their rates in some of these taxes. 10 In the case of local governments, the spending responsibilities assigned to them are regulated by the Local Government Act of 1985, which establishes a minimum list of services to be provided by them, the so-called compulsory services. This list of "compulsory services" increases with population size. 11 As a result, the financing system of local governments also changes with size.
The long process of devolution of spending and revenue capabilities to the regions has several implications for the purposes of our project. Firstly, the changes in the revenue system/spending responsibilities (substantial up to 2002 for spending, and up to 2009 for revenue) induced structural breaks in the time series of the sub-sectors of the general government, and make it difficult the evaluation of the performance of dedicated models on the basis of past data. Regional/local government data of sufficient quality and at the quarterly/monthly frequency has only been disseminated quite recently, and its usefulness in general-purpose models is still limited (see the discussion in Fernández-Caballero et al., 2010). Secondly, notwithstanding the previous comment, it is possible to set up models for the sub-sectors of the general government, but more from a forward-looking point of view, i.e. it is hard to validate them on the basis of standard, out-of-sample forecasting exercises, but looking forward this does not invalidate its potential use for current policy-making.

The budgetary surveillance framework
In year and a half in which the BSL has been applied so far, quarterly reports have been issued on non-compliant regional governments, and even though this has been an improvement with respect to previous practice, they tend to adopt a backward-looking perspective, and no preemptive action has been asked, despite repeated occurrence of non-compliance with pre-set fiscal targets. Precisely, it is in these type of quarterly reports in which the use of additional tools of the kind discussed in our paper could be of help. In

The European budgetary surveillance framework
Beyond national fiscal rules, EU countries are subject to the scrutiny of the European Commission, on the one hand, and the multilateral surveillance of the peer EU countries, following procedures defined within the confines of the so-called Stability and Groth Pact (SGP). Moreover, as a response to the 2010-2011 European sovereign debt crisis, and extensive reform of the SGP and the broader economic governance framework was adopted in 2011 (see European Commission, 2013).
Six pieces of legislation (the "Six pack") reformed both the preventive and corrective arms of the SGP. The latter reform followed the introduction of the so-called "European Semester", a substantially improved surveillance system, covering not only fiscal monitoring but also more general macroeconomic issues.   regularly publish quarterly accounts of all general government sub-sectors in terms of ESA95.
Also, since October 2012, the statistical agencies have been publishing regularly monthly regional governments' accounts in terms of budgetary accounts and, since March 2013, regional and Social Security monthly accounts in terms of national accounts.
From the point of view of fitting empirical models, the newly published time series, that cover at most one year and a half, are of limited use. Nevertheless, they will become the series of references in the future, and have to be somehow connected to the rest of the information on which a wealth of historical information is available. Table 1

Publication lags and timing convention
Annual fiscal outturns for a given year t are published at the very end of March of year t + 1.
Quarterly non-financial accounts for the General government and all its sub-sectors are published regularly with a delay of 90 days. Monthly data for the State sector ("Estado") are published with a lag of one month. Also with a lag of one month are published the data on shared taxes' collection, and social security system outturns, in both cases in cash terms. As regards the newly available information, monthly national accounts data for the Central Government, the regions and the Social Security, are published with a delay of broadly two months.
For the counterfactual, forecasting exercises that will follow in a subsequent section of the paper, following the information provided in the previous paragraph we replicate the real-time constraints faced by real-time fiscal policy analysts, and thus we adopt the timing rules displayed in Table 2, following the standard dates of dissemination of data at the different frequencies. We deem this convention as a fair heuristic representation of reality, on average.
Nevertheless, it is worth mentioning that in a first exercise we will not use real-time data, but    for which we have compiled a truly real-time database. In Table 3 we show the exact information available at each specific date of reference.

A suite of models
Given the different sampling frequencies of the time series included in our dataset, we estimate multivariate, mixed-frequencies models, of the unobserved components type (along the lines of These papers use a temporal aggregation method that relies on the information contained on related indicators observed at the desired higher frequency. The statistical treatment of structural time series models is based on the state space form and the Kalman Filter (see Harvey, 1989). In our case this approach allows the estimation of a monthly model using annual, quarterly and monthly observations, and permit changes over time arising from an increase in sample size.

Evaluating the probability of meeting the budgetary target
The methodology used, i.e. the State Space framework, allows for some easy and relevant exercises.
It is straightforward to build constrained forecasts in order to evaluate how feasible a given target where r is the length of the series minus the number of parameters involved in the model. Two cases may be distinguished: (i) The targets are considered as binding constraints, i.e. the constraints ought to be met exactly. This is equivalent to saying that u = 0; (ii) The targets are unbinding constraints, i.e. the constraints are met "statistically", not exactly with u = 0. Here feasible estimation of Σ u is necessary in order to carry out the test. It is not clear how a reliable estimation of this covariance may be obtained in general, being the most obvious (though not free from problems) to rely on a sufficient long chain of previous forecast errors.
One way of avoiding the not obvious problem of Σ u estimation and still more informative is to use the distribution of the statistic above to calculate probabilities of meeting the targets. This is the approach followed in our paper. In this regard, a distinction between public deficit and revenues on the one hand, and public expenditures on the other, has to be taken into account.
Certainly, meeting a deficit or revenue target means in practice achieving a value greater or equal to the target, as a matter of fact, the further is the unconditional forecast above the target implies a greater probability of meeting such a target. Formally, the probability of meeting the target would translate into P (Y F ≥ R). If the unconditional forecast hits exactly the target, then the probability of meeting the target is 0.5. On the other hand, for expenditures further unconditional forecasts below the target is evidence of a high probability of meeting the target, formally we ought to calculate P (Y F ≤ R).

Some general remarks
We perform a rolling forecasting exercise in which the selection of the forecast origin and the information set available at each date are carefully controlled for. In particular we evaluate the forecasts generated from four forecast origins per year from March 1999 to December 2012, which makes up to 14 × 4 projections at each forecast horizon. The first forecast origin is March 1999, and following the timing convention outlined before (see Table 2 For the nominal public balance, the forecast error committed for year t by model J from forecast origin Q is defined as where Ω makes reference to the information set available at the time of generating a given forecast, as described in Table 2. For revenue and expenditure items, the error committed in year t for item I by model J from forecast origin Q is defined as We compute two standard quantitative measures of forecasting performance for a number of pseudo-real-time forecasting exercises. On the one hand, the ratio of the Root Mean Squared Errors (RMSE) of the different alternative models with respect to an annual random walk (i.e. no-change) alternative. On the other hand, we also look at a qualitative measure of forecast performance, namely, whether the predicted change coincided or not with the actual change observed in the variable of interest. We also present, as discussed in a previous Section, a truly real-time exercise, focused on the 2011 fiscal year, given the relevance of the budgetary deviation observed that year.

Bottom-up vs top-down models
The results of a first exercise are presented in Table 4. In that table we show the relative root mean squared error of our models compared to the annual random walk extrapolation for a number of cases: (i) aggregate of the forecast errors generated for the whole year from all forecasts origins (baseline); (ii) forecasts errors computed on the basis of forecasts computed taking as forecasts origin the first quarter (Q1), the second (Q2), the third (Q3) and the fourth (Q4); (iii) these exercises are presented for the whole sample used for the rolling forecasting exercise ("Full sample", 1999-2012), and for the crisis sample (2008-2012). The following messages can be highlighted from Table 4 for a subset of the results obtained, namely the aggregated results for the general government balance, revenues and expenditures.
First, when looking at the full sample, and pooling all forecast errors (resulting from forecast origins Q1 to Q4), the most aggregated models (i.e. those that model directly the budget deficit), models 1 (general government) and model 6 (sub-sectors), are the best. All other models are close to these ones, with the exception of model 4. This relative ranking of models is broadly kept when . For revenue and expenditure items, the error committed in year t for item I by model J from forecast origin Q is defined as ε I,J t,Q ≡ − 1 * 100. Ω makes reference to the information set available at the time of generating a given forecast, as described in Table 2. looking into forecasts from each origin (Q1 to Q4, taken individually). Thus, with the exception of models 3 versus 4, in the other two cases more aggregate models outperform more disaggregated models. This can be taken as evidence that bottom-up approaches are not necessarily better than top-down ones, at least as regards forecast accuracy. Of course, in real-time, bottom-up approaches provide the advantage of giving a more comprehensive view, which can be an asset in cases like the current one in which overall performance across models is not overwhelmingly different. The main results on bottom-up versus top-down holds when looking into sub-samples.
Second, in general, the forecast accuracy of all models is better in the crisis sample than in the "expansion" one. This may reflect the fact that the models can do a fair job in periods of significant changes, while in a period with no fiscal stress and persistent economic growth, it is more difficult to beat a simple extrapolation of the past.
Third, across quarters, the forecasting performance of all models improves when more information about revenue collection and the implementation of spending plans kicks-in. This is quite clear in the second half of the year compared to forecasts prepared in the first half. In particular, in Q3 a fair amount of information for the first half of the year is assumed to be available, but only the first quarter of the general government accounts, while in Q4 the first half of the year is fully known. For projections prepared from forecast origin Q2 things are quite different. In our timing convention this is the quarter in which the annual figure of year t − 1 is known. This seems to create a discontinuity in how models process incoming information, as forecast accuracy is worse than Q1-based forecasts, a fact that may be linked to the realization of past data revisions, including the appearance of "hidden spending" not reflected in monthly/quarterly indicators.
Fourth, when looking at revenue and expenditure errors, the same general results as regards fullsample versus crisis-sample forecasts, and as regards first semester versus second semester forecasts, hold. Interestingly, models add more information compared to the simple random walk baseline in the case of expenditures than in the case of revenues, i.e. in general relative RMSEs tend to be lower across models, though not in all cases. This provides some evidence on the ability of models to accommodate purely within-the-year discretionary policy changes.
Some interesting results can be highlighted from the particular set of revenue/expenditure projections. In the case of total revenues: (i) the aggregated, joint revenue-expenditures model (model 1, "M1") tends to be the best performer, in particular in the case of the crisis period; this is the case versus the bottom-up approach, M3, and model M4 that does not exploit the link between revenues and expenditures; (ii) as regards the other models, M3 seems to be the best in general, even though it is not clear-cut along all the dimensions considered (full-sample/pre-crisis, across quarters). As regards total expenditures: (i) bottom-up approaches seem to provide (marginal) better results, being clear from the comparison of M3 versus M1, which makes sense given that, in particular, fiscal consolidations tend to have a differentiated profile for the different spending components; while social payments tend to increase in crisis times (unemployment benefits) or to stay, at most, stable (pensions), other components like public investment, or the wage bill, tend to move in the opposite direction in fiscal adjustment periods -this is a differentiated element when compared to government revenues, that are subject to similar macroeconomic shocks, even though tax hikes can be uneven across revenue aggregates; (ii) the aggregation of sectoral models (M5) performs quite badly in the case of TOE, which is not surprising given the fact that central government transfers to the rest of the sectors, in particular to the social security, tend to occur during the year, distorting, thus, the genuine signals of sectoral data.
Beyond the comparison of alternative models across several dimensions of this subsection, it is by now a proven fact in the literature that the combination of alternative models tends to outperform individual models. In the next subsection of the paper we exploit that dimension of our models.

The usefulness of the combination of models
In tables 5 and 6 we compare the performance of the combination of models' forecasts with the forecasts of the European Commission (EC henceforth). As shown in Artis and Marcellino (2001) and Keereman (1999), the forecast record of the EC is among the best of the international organizations producing regular forecasts for European countries, and in particular Spain (others include the International Monetary Fund and the OECD). EC forecasts tend to make use of all of the information available at the time the forecasts are done, not only observed data, but also all available, forward-looking information on budgetary plans, including additional corrective packages enacted by the governments in the course of the year. EC forecasts are based on a bottom-up approach. In addition, EC fiscal forecasts use both macroeconomic models and expert judgement.
That is why checking the performance of the models (specifically, a combination of them) against EC forecasts should be quite a demanding criterion. Even bearing in mind that we are comparing against a difficult-to-beat benchmark, our objective with this exercise is to check the usefulness of the models to complement an approach that takes into account backward-and forward-looking information alike, as in the case of the EC forecast.
. For revenue and expenditure items, the error committed in year t for item I by model J from forecast origin Q is defined as ε I,J t,Q ≡ − 1 * 100. Ω makes reference to the information set available at the time of generating a given forecast, as described in Table 2 approach. Overall, for the full sample and when all the errors from all forecast horizons are pooled, EC government balance forecast errors are lower than the mean and the median of the alternative models. This is also the case as regards full sample government expenditure errors, while in the case of public revenue the opposite happens. This full-sample picture also holds when looking at the crisis sample. Quite interestingly, though, Q3-and Q4-based forecast errors are systematically lower for the combination of models versus EC forecast, both for the whole sample and the crisis sample. This means that as soon as a sufficient amount of data is available on the implementation of spending plans and/or the behavior of revenue collection (under our quite restrictive timing convention in Q3 only the first quarter of the headline general government variable, and half-year in the case of indicators are available) the models are able to process and extrapolate these data in a quite informative way. By this we mean, in a way that even tends to outperform a fullinformation approach that incorporates in an explicit manner forward-looking elements (policy measures affecting future quarters). In particular, in Q4 the mean and the median of models displays a remarkable forecast accuracy. Turning to government revenues, it is surprising the relative bad forecast accuracy of EC forecasts for government revenue growth projections, a result that is dominated by the significant forecast errors around GDP turning points, related to the double-dip crisis.
Turning now to Table 6, the qualitative results shown display similar messages as in the quantitative case. Specifically, in the table we present the percentage of correctly predicted changes in the case of government balance, and the percentage of correctly predicted signs of the growth rate in the case of government revenues and expenditures.

Real-time forecasting exercise for the 2011 year
In this subsection we present an additional exercise, this time a truly real-time one, i.e. based on the exact dataset available at each point in time, is shown in Figure 3. We focus on the fiscal year 2011, a difficult year as discussed in the descriptive Section above.
As discussed, from each forecast origin, and conditional on the short-term information available, we can compute unconditional forecasts (as the ones shown in the previous examples) but also the consistency of government targets with these forecasts. In particular in Figure 3 we present in each single box how the unconditional forecasts (in white color) are updated to the new available information, in relation to the actual final data (dots), the targets (thick line red) and the uncertainty in each case (fanchart up to 99% confidence). The two top rows of charts in Figure 3 shows the In Table 5 we show the relative RMSE of each alternative with respect to the random walk  indicated at that moment an improvement in the budget balance in 2011 compared to 2010, but still the probability associated with meeting the target was 24% percent, while in the case of Model 2 (joint model of revenues and expenditures) the probability assigned to meeting the target was zero, i.e. the target was out of the confidence bands of the model forecast. In the latter case, the probability assigned to meeting the government revenue target, though, was not null, but was as small as 5%. The publication of the general government figures for Q1 and of short-term indicators up to July (September 2011 forecast origin) improved marginally the revenue projection, while in the case of the direct-deficit-model (model 1) the probability assigned to meeting the target was around 10%, but sill, in qualitative terms the forecast signalled an improvement in the government balance, a direction consistent with the objective of the government.
In November 2011 (information set: second quarter of general government variables, shortterm indicators up to September), though, the situation change considerably. Both models were assigning a zero probability to the event "the target is met", and at the same time, both models illustrative about the potential uses of out system. The bold signals of the models were quite clear, in particular after the summer. Since November, the two selected models were signalling clearly to a around 2% of GDP deviation with respect to the official, government target. In the case of the revenue/spending model, in addition, the slippage was almost fully related to revenue shortages, while expenditures, even being forecast to be above the target, were relatively close in quantitative terms. These results contrast with the "herding behavior" observed in Figure 2, and indicates that the data on the implementation of revenue targets were already hinting towards a sizeable slippage, despite corrective measures adopted at the end of the summer. Indeed, in August a broadening of corporate tax bases was approved, and entered into force in Q4, but as not able to compensate the strong, downward trend in tax collection that was visible at least since November 2011 (with data available up to September at the maximum).

Conclusions and policy discussion
In this paper we present a comprehensive fiscal forecasting system, based on all short-term fiscal data available for the Spanish case. Our system is made of a suite of models, with different levels of disaggregation (bottom-up vs top-down; general government vs sub-sectors) suitable for the automatic processing of the large amount of monthly/quarterly fiscal data published nowadays by Spanish statistical authorities.
Beyond presenting the tools as such, in this paper we show some example of its potential applications for real-time monitoring of public finances. In particular, we show how the combination of models provides extremely accurate signals when information pertaining to the first half of the year is available, both in quantitative and qualitative terms. Surprisingly enough, the models contain information that seemed not to have been factored into European Commission fiscal forecasts, that are among the best performers within the set of international organizations, and supposedly incorporate not only past data, but also forward-looking information on approved, but not yet From a policy-making point of view, we also claim that official monitoring bodies could incorporate in their toolkit to evaluate regular adherence to targets more formal elements, of the kind presented in our paper, in order to move the standard evaluation procedure beyond the extant, more "legalistic" approach. In addition, presenting model-based results, or uncertainty tests around government targets, as the ones shown in our real-time exercise, may be helpful to convey to the public risks surrounding fiscal projections (on this subject see also Clark et al., 2013). Incorporating these type of elements may help in improving communication policies as regards sources of risks of (ex-ante) compliance with budgetary targets as well as reasons for (ex-post) budgetary deviations had them occurred.
terms expressed at an annual and quarterly sampling interval (depending on availability) for our objective time series, and u t represents the vector of quarterly indicators. ⎡ The general consensus in this type of multivariate models in order to enable identifiability is to build SUTSE models (Seemingly Unrelated Structural Time Series). This means that components of the same type interact among them for different time series, but are independent of any of the components of different types. In addition, statistical relations are only allowed through the covariance structure of the vector noises, but never through the system matrices directly. This allows that, trends of different time series may relate to each other, but all of them are independent of both the seasonal and irregular components. The full model is a standard BSM that may be written in State-Space form as (see Harvey, 1989) x t = Φx t−1 + Ew t (5) ⎡ where t ∼ N (0, Σ ) and v t ∼ N (0, Σ vt ). The system matrices Φ, E, H and H u in equations  In this way system (5)-(6) becomes (8)-(9). Beware that by setting C t = 0 we return actually