Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save ChadFulton/a9172cd6f41947333f47bd4842832619 to your computer and use it in GitHub Desktop.

Select an option

Save ChadFulton/a9172cd6f41947333f47bd4842832619 to your computer and use it in GitHub Desktop.
Large dynamic factor models, forecasting, and nowcasting in Statsmodels
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gustavogpf
Copy link

Great application and very useful notebook. In the 0.12 version of statsmodels will it also be possible to work with other frequencies (weekly/monthly: DynamicFactorWM? ) in the same way as shown for monthly/quarterly?

@ChadFulton
Copy link
Author

Thanks! For v0.12, it will only be monthly/quarterly. In the future, we will hopefully add support for additional mixed frequency combinations.

@yashwanth11
Copy link

Very useful notebook. Weekly/Monthly frequencies will be super helpful. Thanks!

@ChadFulton
Copy link
Author

Thanks, I'm glad it's helpful. Yes, I hope that we can add additional frequencies soon.

@samvanmeer
Copy link

Hi, I can not seem to open the url https://s3.amazonaws.com/files.fred.stlouisfed.org/fred-md. When running this notebook, I get: HTTP Error 403: Forbidden. Am I doing something wrong? Thanks in advance!

@SebKrantz
Copy link

Change base_url = 'https://files.stlouisfed.org/files/htdocs/fred-md'

@JNMedina75
Copy link

Hello, I'm trying to run the notebook, but I ran into an issue with the 'fredmd_definitions.csv' and 'fredqd_definitions.csv' files. I tried to access the URL in the notebook, but I was met with an error message. I then went to FRED website and downloaded the csv files, but they seem to have different column names, which is creating issues with the grouping.

Would you be able to share the csv appendix files used for the notebook.

Thank You!

@bigcodata
Copy link

hi, does NowCasting model predict a quarter-on-quarter growth rate of GDP? If it is a quarter-on-quarter growth rate, can it be predicted, or can it be calculated?

@jeaniek
Copy link

jeaniek commented Apr 4, 2024

Hi,
Thank you very much for sharing your work. I wonder if the DynamicFactorMQ can also work with the mixed frequency data between yearly and quarterly? And more importantly, if this package can also work with panel data (i.e. multiple countries with data observed across years). Many thanks again!

@lladamartin
Copy link

Hi,
Thank you very much for your work. I wonder how DynamicFactorMQ deals with missing data when you have non-synchronized release indexes. I couldn't find documentation about this issue. Could you please tell me a little more about this?

@ChadFulton
Copy link
Author

Hi @jeaniek, yes, you could use DynamicFactorMQ with quarterly / annual data, and with multiple countries and multiple years. NOte that the package does not do anything special in those cases; it would just be a typical dynamic factor model.

@ChadFulton
Copy link
Author

Hi @lladamartin, I'm not sure I understand exactly what you're asking. Generally, the model allows for arbitrary patterns of missing observations. For example, if you data through 2024Q1 for one dataset, while another dataset has only been released through 2023Q4, then you would have a NaN value for the second dataset for 2024Q1. The model would run just fine in this case.

@lladamartin
Copy link

Thank you for your response. I apologize for the lack of clarity in my question. I have monthly information on a set of indicators. These indicators have 1, 2, and up to 3 months of lag. I want to obtain the latent factor of this information set. How does the model estimate the value of the factor in April if many indicators have missing data for that month and previous months? How does the model handle those missing values?

@ChadFulton
Copy link
Author

Because it is a state space model, where the unobserved state has a defined transition equation, it can produce an estimate for the factor in April even if you had no data for the month (i.e. it just estimates April using its estimate for March combined with the definition of how the state transitions between periods). As you start to observe parts of the data for April, it updates its estimate using whatever data is available. A more detailed description of how this works can be found in, e.g., Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data

@lladamartin
Copy link

I'll see the paper! Thanks!

@tri2911
Copy link

tri2911 commented Jan 16, 2025

Hello @ChadFulton ,

First of all, I want to extend my deep thanks for your work on the DynamicFactorModel MQ. I have been using this model and, overall, it has been a good experience. However, I encountered an issue when it comes to nowcasting.

Here are the details of the problem:

I trained the model using data up to the beginning of Q4 2024 with my vintage dataset. When I apply a new vintage to the trained model, I can still get the nowcast for "Dec 2024".

At the end of December, I have some monthly indicators for December and I update data to my result. However, I am unable to generate a nowcast anymore. The error message I receive is:

ValueError: Prediction must have end after start.

When I run the forecast in this scenario, it only generates forecasts starting from January 2025 instead of December 2024. My quarterly variable for Q4 2024 in endog_quarterly is still NaN, but I am unable to generate a nowcast for this period.

I would appreciate any guidance or suggestions you might have to resolve this issue. Thank you again for your valuable work and support.

Best regards, Tri

@ChadFulton
Copy link
Author

ChadFulton commented Jan 20, 2025

Hi @tri2911, in this case, instead of the forecast method (which only produces out-of-sample forecasts), you can use the predict method (which allows for both in-sample and out-of-sample predictions). For example, if you wanted a prediction for the months of the fourth quarter, you could do:

res.predict(start='2023-10', end='2023-12')

Or if you really want a prediction only for December, you would do:

res.predict(start='2023-12', end='2023-12')

@robertoaquino
Copy link

"Hello @ChadFulton, I hope you're doing well. I want to express my gratitude to you for providing this high-quality content for those interested. I'm learning a lot from this material.

I'm trying to run the model codes and encountered a stumbling block that I'm having difficulty resolving.

It refers to the replacement of names and the mapping of variables. It seems that the names are not being assigned properly and this is causing problems further on, with variables not being found.

I've reviewed the data, variables, names, etc., and apparently everything is correct, but when I run the code, problems occur further on.

If there have been any important changes to the data or how to execute it, or any updates to this part of the code, I believe I might have missed them. If you could take a look at this part and guide me on how to proceed, I would be very grateful.

*Definitions from the Appendix for FRED-MD variables
defn_m = pd.read_csv('C:\Users\Cliente\Documents\fredmd_definitions.csv', encoding='windows-1252')
defn_m.index = defn_m.fred

*Definitions from the Appendix for FRED-QD variables
defn_q = pd.read_csv('C:\Users\Cliente\Documents\fredqd_definitions.csv', encoding='windows-1252')
defn_q.index = defn_q['FRED MNEMONIC']

*Example of the information in these files:
print(defn_m.head())
print(defn_q.head())

*Replace the names of the columns in each monthly and quarterly dataset
map_m = defn_m['description'].to_dict()
map_q = defn_q['DESCRIPTION'].to_dict()
for date, value in dta.items():
value.orig_m.columns = value.orig_m.columns.map(map_m)
value.dta_m.columns = value.dta_m.columns.map(map_m)
value.orig_q.columns = value.orig_q.columns.map(map_q)
value.dta_q.columns = value.dta_q.columns.map(map_q)

*Get the mapping of variable id to group name, for monthly variables
groups = defn_m[['description', 'group']].copy()

*Re-order the variables according to the definition CSV file
*(which is ordered by group)
columns = [name for name in defn_m['description']
if name in dta['2024-10'].dta_m.columns]
for date in dta.keys():
dta[date].dta_m = dta[date].dta_m.reindex(columns, axis=1)

*Add real GDP (our quarterly variable) into the "Output and Income" group
gdp_description = defn_q.loc['GDPC1', 'DESCRIPTION']
new_row = pd.DataFrame([{'description': gdp_description, 'group': 'Output and Income'}])
groups = pd.concat([groups, new_row], ignore_index=True)

*Display the number of variables in each group
(groups.groupby('group', sort=False)
.count()
.rename({'description': '# series in group'}, axis=1))"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment