The Accessing Bing Webmaster Tools API using cURL post demonstrated how to submit URLs in batches to Bing. All you need to do is pass the appropriate URLs to this API. In this post, I will create a Bash script that automatically specifies the URLs of updated pages by referring to the last modified dates in the sitemap.
Retrieve Sitemap and Extract Newer Entries
First, retrieve the sitemap using curl, and extract newer entries than the last submitted entry. Because jq can filter JSON data interactively and concisely in Bash scripting, transcode XML to JSON using the xq
command from the yq package. Suppose the following sitemap and transcoded one:
<?xml version='1.0' encoding='utf-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-0</loc>
<lastmod>2020-12-31T00:00:00Z</lastmod>
</url>
<url>
<loc>https://example.com/page-1</loc>
<lastmod>2021-01-01T00:00:00Z</lastmod>
</url>
<url>
<loc>https://example.com/page-2</loc>
<lastmod>2021-01-02T00:00:00Z</lastmod>
</url>
</urlset>
{
"urlset": {
"@xmlns": "http://www.sitemaps.org/schemas/sitemap/0.9",
"url": [
{
"loc": "https://example.com/page-0",
"lastmod": "2020-12-31T00:00:00Z"
},
{
"loc": "https://example.com/page-1",
"lastmod": "2021-01-01T00:00:00Z"
},
{
"loc": "https://example.com/page-2",
"lastmod": "2021-01-02T00:00:00Z"
}
]
}
}
Then, filter all the elements of the array of the url
key using jq. Let the date of the last submitted entry be 2020-12-31T00:00:00Z. Select elements having a lastmod
key with a newer date than this one. Then, construct an array from the entire stream using the -s
option, and sort them in ascending order by the date of the lastmod
key. Because of descending time units, you can compare these dates in ISO 8601 format as strings.
curl https://example.com/sitemap.xml |
xq |
jq '.urlset.url[] | select(.lastmod > "2020-12-31T00:00:00Z")' |
jq -s 'sort_by(.lastmod)'
The resulting JSON object is:
[
{
"loc": "https://example.com/page-1",
"lastmod": "2021-01-01T00:00:00Z"
},
{
"loc": "https://example.com/page-2",
"lastmod": "2021-01-02T00:00:00Z"
}
]
The number of newer entries is the length
of this array. Suppose you assign this object to a newer_list
variable:
echo $newer_list | jq length
Request Daily Quota for URL Submission
Additionally, request the remaining daily quota for URL submission from Bing using the GetUrlSubmissionQuota
method of the Bing Webmaster API. These methods require an API key. The DailyQuota
key is in the d
key of the JSON response.
curl -X GET "https://ssl.bing.com/webmaster/api.svc/json/GetUrlSubmissionQuota?siteUrl=https://example.com/&apikey=$api_key" |
jq .d.DailyQuota
Add Newer Entries to URL List
Next, suppose you assign the URLs of newer entries to a url_list
variable that contains comma-separated URLs for a JSON request. The number of these URLs should be less than or equal to the daily quota specified in the previous section. A loc
variable is assigned the URL stored in the loc
key at the index
. A lastmod
variable updates the date of the newest entry in the current submission. You can remove unnecessary quotes from the value of the lastmod
key using the -r
option.
index=0
while [ "$index" -lt "$newer_length" ] && [ "$index" -lt "$daily_quota" ]; do
loc=$(echo $newer_list | jq .[$index].loc)
lastmod=$(echo $newer_list | jq -r .[$index].lastmod)
if [ -z "$url_list" ]; then
url_list=$loc
else
url_list="$url_list, $loc"
fi
((index++))
done
The resulting value of the url_list
variable is:
"https://example.com/page-1", "https://example.com/page-2"
Submit URL List and Store Submitted Entry
Finally, submit the value of the url_list
variable above to Bing using the SubmitUrlBatch
method of the Bing Webmaster API.
curl -d "{\"siteUrl\": \"https://example.com/\", \"urlList\": [$url_list]}" \
-H 'Content-Type: application/json; charset=utf-8' -X POST \
"https://ssl.bing.com/webmaster/api.svc/json/SubmitUrlBatch?apikey=$api_key"
Additionally, store the value of the lastmod
variable above as the date of the last submitted entry for the next submission that repeats from the beginning of this post.
Python Script Example
Combining the above processes, you can submit the URLs of the updated pages without manually specifying them. While this post discusses the process in the context of Bash scripting, the same principles apply to a real-world Python application. I have published the submit_urls.py
Python script on GitHub, originally a Bash script.
No comments:
Post a Comment