Auto Generating .PY files using Python

Prithoo Medhi
5 min readMar 12, 2022

Ever wonder how boilerplate codes are auto-generated by frameworks like Django? Well so did I, until I had to make a script to do more or less the same thing in one of my practice projects.

Background

Recently I was working on a hobby project that used spacy for…reasons. Long story short, spacy models can be used to identify the language of a text-blob since you know, processing natural language text kind of depends on knowing which language you are processing. So here I was, using this by-product of an advanced NLP library to detect the language in text blobs.

The thing is, the Spacy function returns the ISO 639.1 code for the language, now this is fine if you are a philologist or a linguist who probably knows the codes by heart at this point. But seeing as I am just some idiot with an internet connection this would not do for me; like how is the code for Mandarin Chinese zh?

So I did what we do whenever we come across a problem we do not know how to solve:

I threw a hash table at it.

Well, I threw a dictionary at it since we are working with python here. I even threw in a

if __name__ == “__main__”:

in there for good measure.

It looked something like this:

LANG_DICT = {    ‘en’: ‘English’, ‘es’: ‘Spanish’, ‘fr’: ‘French’, ‘de’: ‘German’,    ‘it’: ‘Italian’, ‘pt’: ‘Portuguese’, ‘ru’: ‘Russian’, ‘ja’: ‘Japanese’,    ‘zh’: ‘Chinese’, ‘ko’: ‘Korean’, ‘ar’: ‘Arabic’, ‘tr’: ‘Turkish’,    ‘vi’: ‘Vietnamese’, ‘hi’: ‘Hindi’, ‘id’: ‘Indonesian’, ‘ms’: ‘Malay’,    ‘th’: ‘Thai’, ‘fa’: ‘Persian’, ‘bn’: ‘Bengali’, ‘tl’: ‘Tagalog’,    ‘ta’: ‘Tamil’, ‘te’: ‘Telugu’, ‘ml’: ‘Malayalam’, ‘kn’: ‘Kannada’,    ‘mr’: ‘Marathi’, ‘pa’: ‘Punjabi’, ‘gu’: ‘Gujarati’, ‘or’: ‘Oriya’,    ‘sa’: ‘Sanskrit’, ‘ur’: ‘Urdu’, ‘as’: ‘Assamese’, ‘my’: ‘Burmese’,    ‘am’: ‘Amharic’, ‘ne’: ‘Nepali’, ‘sd’: ‘Sindhi’, ‘si’: ‘Sinhala’,    ‘lo’: ‘Lao’, ‘km’: ‘Khmer’, ‘bo’: ‘Tibetan’}if __name__ == “__main__”:    pass

Now obviously anyone who has ever written code in their life will see the problem with the block above:

That dictionary is unordered and unsorted.

If I ever had to come back and do maintenance on it or look up a key in there, I was done for. But as I said earlier, I am an idiot; so I only noticed this after my code was already using and depending on this dictionary.

Working Theory

Now if I was a sane person, I would just sort the dictionary, print it to the console, copy it to a json linter, then paste it back in place of the original dictionary. But if I was a sane man, I would not be sitting here at 3 am in the morning and writing this.

Simply put, that seemed like way too much work to me (do not ask me how it made sense to me at the time) and I wanted to automate every single aspect of the task:

from sorting the dictionary to formatting and copying to a new python file.

Now how would I do that, you ask? Well, the algorithm is extremely simple and frameworks like Django implement something like it all the time to generate boilerplates whenever we create a new project or application:

  • Sort the dictionary and save it to a variable.
  • Save all the other text/code blocks that are to be written in other variables using formatted strings.
  • Turn the new dictionary variable into a json-string using json.dumps().
  • Concatenate all the strings in proper order and save them to a new data variable.
  • Open a file as ‘w + t’ (write+ text mode); the ‘+’ means that the file will be created if it does not already exist. Make sure that the extension of the file is ‘.py’ as that will determine if the file is recognised as a python file by the IDE for later use.
  • Write the data variable to the opened file.

There, now we have a brand new Python file that we can import the now sorted dictionary from.

Code

Ahh, the part you clicked on the link for.
Here goes:

Import the original language dictionary and the Python Json package:

from code.choices import LANG_DICTimport json

Declare the necessary global constants:

FILE_PATH = “code\choices2.py” ## Make sure the name is different for the time being to avoid issues; this is a file, not a variable.DICT_NAME = “LANG_DICT = “ ## This is the name of the output dictionary that will be written to the .py file.

Sort the original dictionary and save the output to a new dictionary variable:

lang_dict = sorted(LANG_DICT)LANG_DICT_2 = {}for key in lang_dict:    LANG_DICT_2[key]=LANG_DICT.get(key)

Define any other text block(s) we need for the new file and save them to their own string variables.

if_name_block = ‘’’    \nif __name__ == “__main__”:    \tpass‘’’

Join the string variables in order of their appearance in the final file and save them to a new string variable; let us call it data.

op_string = DICT_NAME + json.dumps(LANG_DICT_2, indent=4) +         if_name_block

Here I used json.dumps() to convert the sorted dictionary variable to a string so that it plays nicer with the file I/O methods.

Write the data variable to the file.

with open(FILE_PATH, ‘w+t’, encoding=’utf-8') as op_file:    op_file.write(op_string)

Conclusion

And there you have it. A brand new python file, ready to be processed by the interpreter.

See how the dictionary keys are now so neatly sorted in proper alphabetical order, just ripe for general maintenance and manual lookup.

LANG_DICT = {    “am”: “Amharic”,    “ar”: “Arabic”,    “as”: “Assamese”,    “bn”: “Bengali”,    “bo”: “Tibetan”,    “de”: “German”,    “default”: “Unknown”,    “en”: “English”,    “es”: “Spanish”,    “fa”: “Persian”,    “fr”: “French”,    “gu”: “Gujarati”,    “hi”: “Hindi”,    “id”: “Indonesian”,    “is”: “Icelandic”,    “it”: “Italian”,    “ja”: “Japanese”,    “km”: “Khmer”,    “kn”: “Kannada”,    “ko”: “Korean”,    “lo”: “Lao”,    “ml”: “Malayalam”,    “mr”: “Marathi”,    “ms”: “Malay”,    “my”: “Burmese”,    “ne”: “Nepali”,    “no”: “Norweigian”,    “or”: “Oriya”,    “pa”: “Punjabi”,    “pt”: “Portuguese”,    “ru”: “Russian”,    “sa”: “Sanskrit”,    “sd”: “Sindhi”,    “si”: “Sinhala”,    “ta”: “Tamil”,    “te”: “Telugu”,    “th”: “Thai”,    “tl”: “Tagalog”,    “tr”: “Turkish”,    “ur”: “Urdu”,    “vi”: “Vietnamese”,    “zh-cn”: “Chinese”}if __name__ == “__main__”:    pass

So that was how I handled a curious little hiccup I faced today. Was there an easier solution to this? Well…of course, there was. In fact, there must be numerous better solutions out there, but that would not have been interesting for me.

And with that, I bid you good night my brave young folks who have stayed until the end of this post.

--

--

Prithoo Medhi

Perennial idiot, occasional freelancer and sporadic poster!