web scraping of character sheets

#1 Nov 30, 2018
ltdata1974
ltdata1974

View User Profile

View Posts

Send Message
- Adventurer
- Join Date: 7/19/2017
- Posts: 3
- Member Details
I want to automate the printing of initiative tents for all the characters in my campaigns using their most updated stats from DnDBeyond. I did this by writing a python web scraper using the selenium webdriver. To make sure I never scraped anything licensed I didn't authenticate and just went thru the public/anonymous access when pulling down the base stats from the characters in my party.

After letting it run a few times (maybe 100 webpage calls over a day) to finish debugging ... I now get a 500 error "Whoops! We rolled a 1 on our Retrieve Character check. We're heading into town to visit the blacksmith for repairs. Try again after a Short Rest."

I checked from another machine with a different IP address and I don't get the error. So it looks like if you scrape even your own character sheets from DnDBeyond your IP address will get put into a website.

However, if I authenticate the webpage works (but I didn't want to do that as I didn't want to accidentally download any licensed materials).

Has anybody else run into the 500 error blocks for hitting the page too many times? Anybody know the sweet spot I can set this to so I stay under the blockers radar (after debugging I don't see needing to have it run more than 1 load every day or two).

If I'm violating any terms of service pulling down my own stats let me know and I wouldn't do that... but even facebook lets me do this and they are tools.

Rollback Post to Revision RollBack
#2 Nov 30, 2018
Stormknight
Stormknight

View User Profile

View Posts

Send Message
D&D Beyond Founders
- Queen of Storms
- Join Date: 3/20/2017
- Posts: 10,912
- Member Details
Hi there,

yes, there is a throttling mechanic in place, because people were running scraping bots to automate data extraction from the site and it was trashing database performance making the site far less usable for everyone.

That was a different issue though - people were trying to cycle through all shared characters on the system to pull stats on them.

There's some theories why people might do that, but I am not sure anyone really knows for sure.

I know that the developers would certainly appreciate you keeping any automated updates to a minimum though.

Is your script pulling the entire page, or just the XML data for the character sheet?

Rollback Post to Revision RollBack

Pun-loving nerd | Faith Elisabeth Lilley | She/Her/Hers | Profile art by Becca Golins
If you need help with homebrew, please post on the homebrew forums, where multiple staff and moderators can read your post and help you!
"We got this, no problem! I'll take the twenty on the left - you guys handle the one on the right!"🔊
#3 Nov 30, 2018
ltdata1974
ltdata1974

View User Profile

View Posts

Send Message
- Adventurer
- Join Date: 7/19/2017
- Posts: 3
- Member Details
Thank you. That sounds reasonable. I put in a sleep call between each page load to make sure the page loads don't hit back to back.

It loads the full page. I didn't know there was a URL for the XML of the data (it seems like the stats were from javascript call).

Do you know if the throttle is a daily limit and if so what it might be? I have 9 characters in one party and 7 in the other. Even if I don't find the XML URL loading 16 full pages a day shouldn't tax even the worst server (I manually click around WAY more than 16 times a day on this website). But I understand why they need to do this.

Rollback Post to Revision RollBack
#4 Nov 30, 2018
Stormknight
Stormknight

View User Profile

View Posts

Send Message
D&D Beyond Founders
- Queen of Storms
- Join Date: 3/20/2017
- Posts: 10,912
- Member Details
This information has been discovered and posted about by several members of the community already - if you append /json to the end of the character sheet url, it returns the XML for the sheet, which should be less issues for the servers.

I won't discuss the hosting solution, but you can be assured that 16 calls a day is going to be fine - the throttling was to stop people who were making hundreds of calls per minute.

Rollback Post to Revision RollBack

Pun-loving nerd | Faith Elisabeth Lilley | She/Her/Hers | Profile art by Becca Golins
If you need help with homebrew, please post on the homebrew forums, where multiple staff and moderators can read your post and help you!
"We got this, no problem! I'll take the twenty on the left - you guys handle the one on the right!"🔊
#5 Nov 30, 2018
ltdata1974
ltdata1974

View User Profile

View Posts

Send Message
- Adventurer
- Join Date: 7/19/2017
- Posts: 3
- Member Details
Cool.

I am very familiar with the /json file but that requires authentication which I was trying to avoid. I misunderstood and thought you were referencing a flat XML file that was somewhere in the character sheet.

Rollback Post to Revision RollBack
#6 Sep 27, 2021
LordAshes
LordAshes

View User Profile

View Posts

Send Message
- Adept
- Join Date: 12/1/2018
- Posts: 32
- Member Details
Has the json option been removed? I have tried to append the /json to both the character sheet URL and the sharable link but both result in a page not found.

Rollback Post to Revision RollBack
#7 Sep 27, 2021
pocketmouse
pocketmouse

View User Profile

View Posts

Send Message
- (Perfect)
- Join Date: 5/9/2018
- Posts: 4,148
- Member Details
Yeah, ages ago, because it was a security risk.

Rollback Post to Revision RollBack

Birgit | Shifter | Sorcerer | Dragonlords
Shayone | Hobgoblin | Sorcerer | Netherdeep
To post a comment, please login or register a new account.

Previous Thread

Next Thread