April 21, 2021; Updated on

Bash Scripting to Update Blogger Posts for Local Changes through API

On this blog, I edit drafts in a local text editor, not in Blogger CMS, and more often update posts than post new ones. To simplify the updating process, I create a Bash script that automatically extracts specific elements from local HTML files and sends PATCH requests with them through the Blogger API 3.0.

This post assumes the following local HTML files:

  • Each div.post element in a file:
    • It has an empty line before and after it to distinguish its changes.
    • The value of its id attribute is the known Blogger resource (page or post) ID.
    • Its start tag is a single line.
    • It has paragraphs in its post body.
  • Git tracks changes in these files.
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Document Title</title>
  </head>
  <body>

    <div class="post" id="RESOURCE_ID">
      <h3>Post Title</h3>
      <div class="post-body">
        <p>The first paragraph.</p>
        <p>The second paragraph.</p>
      </div>
    </div>

    <div class="post" id="RESOURCE_ID">
      <h3>Post Title</h3>
      <div class="post-body">
        <p>The first paragraph.</p>
        <p>The second paragraph.</p>
      </div>
    </div>

  </body>
</html>

Detect Local Changes using Git and Specify Elements

First, detect changed local HTML files individually using git diff with the --name-only option. For each file, extract line numbers of differences from chunk headers starting with the @ characters.

For each line number, specify the address range up to its number and examine a resource ID among the start tags of div.post elements having an id attribute. Then, assign the nearest one to the line to a resource_id variable because this element contains the difference.

If the length of the resource_id variable is zero, the div.post element does not have an id attribute. If the value is equal to the previous one, the difference exists within the same element as the previous one. Ignore these cases.

id_regex='^.*<div(.*\s+id="([0-9]+)")?.*\s+class="post"(.*\s+id="([0-9]+)")?.*$'
id_replacement='\2\4'
for file in $(git diff --name-only); do
    for line_number in $(git diff -U0 "$file" |
                             sed -nE 's/^@@+ -[0-9]+(,[0-9]+)? \+([0-9]+)(,[0-9]+)? @@+.*$/\2/p'); do
        resource_id=$(sed -nE "1,${line_number}s/$id_regex/$id_replacement/p" \
                      "$file" |
                      tail -1) || exit
        if [ ! -z "$resource_id" ] &&
               [ "$resource_id" != "$previous_resource_id" ]; then
            # Extract the post title and body.
        fi
        previous_resource_id=$resource_id
    done
done

Extract Specified Elements using XML Parser

Extract the post title from the file corresponding to the resource ID specified in the previous section. I use xmlstarlet to extract the contents of these elements.

xmlstarlet fo -H "$file" 2>/dev/null |
    xmlstarlet sel -t -c "//div[@id=\"$resource_id\"]/h3/node()"

Similarly, extract the post body and modify it if necessary. Then, pass it to jq as a single long string to escape reserved characters in JSON. In the following example, I add the “Read more” link of Blogger after the first paragraph.

xmlstarlet fo -H "$file" 2>/dev/null |
    xmlstarlet sel -t -c \
               "//div[@id=\"$resource_id\"]/div[@class=\"post-body\"]/node()" |
    sed -E "0,/<\/p>/s//<\/p>\n\n<!-- more -->/" |
    jq -sR

Send PATCH Requests through Blogger API

Next, send PATCH Requests using curl through the Blogger API for the extracted contents in the previous section. Because the following methods of the API use OAuth 2.0 for authorization, prepare an access token as described in my preceding post.

The API does not seem to update draft posts. Accordingly, list them using the list method and parameters. Then, extract their resource IDs to examine the status of the updating post.

curl -H "Authorization: Bearer $access_token" -X GET \
    "https://www.googleapis.com/blogger/v3/blogs/$blog_id/posts?status=draft" |
    jq -r .items[].id |
    grep -q ^$resource_id$

If the status is draft, temporarily publish the post using the publish method.

curl -H "Authorization: Bearer $access_token" -X POST https://www.googleapis.com/blogger/v3/blogs/$blog_id/posts/$resource_id/publish

Now, send the data of the element using the patch method. I decided to send a post title along with a post body because it is occasionally updated and much shorter than a post body. Note that the Blogger API does not appear to support a description property.

curl -d "{\"title\": \"$title\", \"content\": $content}" \
    -H "Authorization: Bearer $access_token" \
    -H 'Content-Type: application/json; charset=utf-8' -X PATCH \
    https://www.googleapis.com/blogger/v3/blogs/$blog_id/posts/$resource_id

If the post is temporarily published as described above, revert it to the draft status using the revert method.

curl -H "Authorization: Bearer $access_token" -X POST https://www.googleapis.com/blogger/v3/blogs/$blog_id/posts/$resource_id/revert

Create Commit using Git

Finally, stage the local files and create a new commit using git commit.

git commit -a -m "Update $resource_id"

Bash Script Example

Combining the above processes, you can update posts without manually copying and pasting changes from local HTML files into Blogger CMS. In a real-world application, I have published the update_blogger.sh Bash script on GitHub.

No comments:

Post a Comment