How to use Google Speech API

  • このエントリーをはてなブックマークに追加

Summary

This article is a note and usage to use Google Speech API that can convert a voice file to text.

Why am I interested In this API

In 2014, Google announced that 50% of teenagers in America search on google by their voice so that I think this new search way will be common several yeas later. So I want to use this api and came up with a service with voice api.

API

This API might divide into two types, Google Could Speech API and Speech API. I guess that those api are using same logic because the accuracy of voice api is almost same.

Compare Google Cloud Speech API and Speech API

Google Could Speech

・ Official
The document covers everything in detail.
・This api is working on Google Could Platform and Google provides quqon for $300

Speech API

・unofficial
・The amount of information in the document is short
・This api can be working on your laptop with API key

I’ll introduce Speech API.

How to use Speech API

Summary

  1. Get API key on Google Developer Console
  2. Record your voice on your Mac
  3. Convert the voice file to text through API

Usage

① Get API key when you join the development group

https://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev
Just click “Join this group”

② Search for “Speech API” on Google Developer Console and set valid.

https://console.developers.google.com/project

③ Create API Key on the authentication of Google Developer Console

④ Try to test whether the API key is valid or not.

$ git clone https://github.com/gillesdemey/google-speech-v2.git  
$ cd google-speech-v2/  
$ curl -X POST --data-binary @'audio/hello (16bit PCM).wav' --header 'Content-Type: audio/l16; rate=16000;' 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=<your api key>'

You should overwrite <your api key> to the API key 3 you got in step 3. audio/hello (16bit PCM).wav is a test voice file that is recorded “Hello Google” by someone.

⑤ Record your voice on mac

$ brew install sox  
$ rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav  
$ rec WARN formats: can't set sample rate 16000; using 44100  
$ rec WARN formats: can't set 1 channels; using 2

Finish to record when you type “Ctrl+C”

⑥ Convert the voice file to text

$ curl -X POST --data-binary @'test.wav' --header 'Content-Type: audio/l16; rate=16000;' 'https://www.google.com/speech-api/v2/recognize?output=json&lang=ja-JP&key=<your api key>'

Result (Japanese)

「テストテスト」

{"result":[{"alternative":[{"transcript":"テスト テスト","confidence":0.98299336},{"transcript":"TEST TEST"},{"transcript":"テントテスト"},{"transcript":"Z テスト"},{"transcript":"test テスト"}],"final":true}],"result_index":0}

「すもももももももものうち」

{"result":[{"alternative":[{"transcript":"すもももももももものうち","confidence":0.99271148},{"transcript":"スモモも桃も桃のうち"},{"transcript":"すももも桃も桃のうち"},{"transcript":"すももももももももの家"},{"transcript":"スモモも桃も桃の家"}],"final":true}],"result_index":0}

「イカが二貫」

{"result":[{"alternative":[{"transcript":"イカが二貫","confidence":1},{"transcript":"いかがにか"},{"transcript":"如何にか"},{"transcript":"いかが 2巻"},{"transcript":"如何に花"}],"final":true}],"result_index":0}

「みたろ、ドラえもん、かったんだよ。ぼくひとりで。もう安心して帰れるだろう、ドラえもん」

{"result":[{"alternative":[{"transcript":"みたろ ドラえもん 買ったんだよ 僕ひとりでもう安心して帰れるだろ ドラえもん"},{"transcript":"見たろ ドラえもん 買ったんだよ 僕一人でもう安心して帰れるだろ ドラえもん"},{"transcript":"見たろ ドラえもん 買ったんだよ 僕ひとりでもう安心して帰れるだろ ドラえもん"},{"transcript":"みたろ ドラえもん 買ったんだよ 僕一人でもう安心して帰れるだろ ドラえもん"},{"transcript":"見たろ ドラえもん簡単だよ 僕一人でもう安心して帰れるだろ ドラえもん"}],"final":true}],"result_index":0}

「嘘みたいだろ。死んでるんだぜ、それで。」

{“result":[{"alternative":[{"transcript":"嘘みたいだろ 死んでるんだぜ それで","confidence":0.98469388},{"transcript":"ウソみたいだろ 死んでるんだぜ それで"},{"transcript":"うそみたいだろ 死んでるんだぜ それで"}],"final":true}],"result_index":0}

Those tests are almost perfect even though the voice is in Japanese.

Limit of API

・50 request per a day
・Error is more likely to happen when you try to convert a voice file with 30 seconds.
・Cloud Speech API is free when you use 60 minutes per a month. If you exceed, you need to pay but it’s still cheap, I think.

Impression

The accuracy of this API might depend on the voice if you speak clearly, but it is really precise and nice. Speech API is not officially published and has the request limitation. So, when you produce an app with voice API, I recommend to install Could Speech API. This API is also sophisticated and cheap.

Anyway, when I recorded my voice for test, I was really embracing. You should be alone when you try to record your voice.

  • このエントリーをはてなブックマークに追加