Preserving Quotes in YAML

For a bit of context, I’m working on a project that spits out a YAML file that has properties with UUIDs. These UUIDs are linked to different text files and image files. The goal was to have some kind of web component consisting of an SVG with a text overlay. The text was separated out for third-party localization.

I wrote a Python script and successfully populated the YAML file with what I needed for now.

id: 5214484c-42d2-11ee-bb4b-a0cec89b3edd
type: "graphic"
subtype: ""
metadata:
  media-not-started: "false"
title: ""
transcript: ""
assets:
- 52117bbc-42d2-11ee-bb4b-a0cec89b3edd
- 521285a2-42d2-11ee-bb4b-a0cec89b3edd
- 5213159e-42d2-11ee-bb4b-a0cec89b3edd
ui:
  instructions: ""
  images: []
  svgs:
  - text1: 52117bbc-42d2-11ee-bb4b-a0cec89b3edd
    text2: 521285a2-42d2-11ee-bb4b-a0cec89b3edd
    text3: 5213159e-42d2-11ee-bb4b-a0cec89b3edd
    file: ""
    caption: ""
ratio: "822/861"
stylesheet: 5214b930-42d2-11ee-bb4b-a0cec89b3edd

Every UUID seen here has a corresponding directory, YAML file and markdown file with the same UUID. For example, the property of text1 would have:

52117bbc-42d2-11ee-bb4b-a0cec89b3edd/ #directory
|
|-- 52117bbc-42d2-11ee-bb4b-a0cec89b3edd.yml
|
+-- 52117bbc-42d2-11ee-bb4b-a0cec89b3edd.md

But there’s an issue. If you take a look at the YAML file, some properties are wrapped in double quotes and some are not. All values in the YAML file need to be surrounded by double quotes for my situation.

Initially, I went with PyYaml’s library to assist me in wrangling YAML with Python. Everything was working well up until this point. Diving into the docs I can tell some programmers find documentation hard because it did not help at all. How can I RTFM if the manual is useless? Dive into the source code? I really shouldn’t need to do that for something so simple as this. Even, after diving through the source code, it was still unclear.

Tip: When writing technical documentation, imagine yourself as NOT the person who developed it. Also, imagine you’re explaining it to your mom. Thanks.

After a bit of research, I switched to ruamel.yaml’s Python library. The API was similar for my needs and the source code was easy to navigate. Refactoring was minimal and it solved my missing double quote issue.

from ruamel.yaml import YAML # 1
from ruamel.yaml.scalarstring import DoubleQuotedScalarString # 2
yaml = YAML()                       
yaml.preserve_quotes = True # 3
def media_insert_text_field(fields: List[Dict]) -> Tuple:
    assets = []
    ui = {} 
    for field in fields:
        # 4
        double_quoted_scalar = DoubleQuotedScalarString(field['uuid'])
        assets.append(double_quoted_scalar)
        ui[f"text{field['index']}"] = double_quoted_scalar 
    
    return (assets, ui)
...
  1. Load ruamel.yaml.

  2. Import DoubleQuotedScalarString to force double quotes on scalar types.

  3. Set preserve_quotes to true.

  4. Force double quotes on field['uuid'] with DoubleQuotedScalarString.

You may be asking yourself why I needed DoubleQuotedScalarString if preserve_quotes was set to true. I’m still asking myself that question. Data was triple checked and followed the same format as the other properties I’m injected in the final YAML so I’m not sure why. But ruamel.yaml made it easy to force it. That’s all that matters to me.

I’ll end by reiterating a tip to write technical documentation like your mom will be reading it. (Not that I’m a mother, but you get the gist.)

Be sure to follow me @chrisparaiso.

Built with Hugo
Theme Stack designed by Jimmy