Ibm Watson


Search and PDF Files reader system based on ibm watson services.(GRIMM BOOT)

Introduction:
The system for searching and reading stories, with the help of the IBM Watson cloud services, allows us to recognize a spoken pabra which will be processed to be delivered to a search method which will look for the best option of files so that I can be led with ibm watson services.

Theory:

Python:
Python is a multiparadigma programming language whose philosophy emphasizes a syntax that favors readable code. The main advantage that Python offers is that being a language classified as weakly typed and dynamic, it is very simple to start in object-oriented programming without taking into account other concepts of object typing.

IBM Watson
It is a tool that uses Artificial Intelligence capable of answering questions asked in natural language. This application is developed by IBM, Watson uses the services in the cloud and a database which is processed by the different services of IBM, it should be noted that these data are taken from multiple sources such as: encyclopedias, dictionaries, thesauri, news articles, and literary works, as well as external databases, taxonomies, and ontologies

DEVELOPING
For this project we use the following: ibm watson services:

Speech to text:
The IBM® voice-to-text service provides APIs that use IBM's voice recognition capabilities to produce spoken audio transcripts. The service can transcribe voice of several languages ​​and audio formats. In addition to the basic transcription, the service can produce detailed information on many different aspects of the audio. For most languages, the service supports two sampling rates, broadband and narrowband. Returns all JSON response content in the UTF-8 character set.

For voice recognition, the service supports synchronous and asynchronous representation state transfer (REST) ​​interfaces. It is also compatible with a WebSocket interface that provides a full-duplex and low-latency communication channel: clients send requests and audio to the service and receive results through a single connection asynchronously.

The service also offers two customization interfaces. Use the language model customization to expand the vocabulary of a base model with domain-specific terminology. Use the acoustic model customization to adapt a base model to the acoustic characteristics of your audio. For the customization of the language model, the service also supports grammars. A grammar is a formal language specification that allows you to restrict the phrases that the service can recognize.

Language model customization is generally available for production use with most compatible languages. Acoustic model customization is a beta functionality that is available for all supported languages.

Text to speech:
The IBM® text-to-speech service provides APIs that use IBM's speech synthesis capabilities to synthesize natural-sound voice text in a variety of languages, dialects and voices. The service supports at least one male or female voice, sometimes both, for each language. The audio is transmitted back to the client with a minimum delay.

For voice synthesis, the service supports a synchronous HTTP representation state transfer (REST) ​​interface. It also supports a WebSocket interface that provides plain text and SSML input, including the SSML <mark> element and word times. SSML is an XML-based markup language that provides text annotations for speech synthesis applications.

The service also offers a personalization interface. You can use the interface to define phonetic or sound translations for words. A translation similar to sound consists of one or more words that, when combined, sound like the word. A phonetic translation is based on the SSML phoneme format to represent a word. You can specify a phonetic translation in the standard representation of the International Phonetic Alphabet (IPA) or in the IBM patented Symbolic Phonetic Representation (SPR).

Translation:
IBM Watson ™ Language Translator translates text from one language to another. The service offers multiple translation models provided by IBM that you can customize according to your unique terminology and language. Use Language Translator to take news from around the world and present them in your language, communicate with your customers in your own language and more.


DESIGN:
For the development of the project, we have the following diagram created with the services to choose:

Img 1 Design.
Pyaudio helps us make the recording that takes place in a span of 3 seconds

Img 2. Record function
We have the function to recognize IBM, which allows us to recognize an audio input through the microphone and capture it in a wab extension audio.
We only extract the entry from the ibmwatson dictionary as shown in image 3.
Img 3. Recognize IBM
We call the function recognize ibm which has our entry, through a cycle for we list all the existing pdf files in our directory, we look for the entry that most closely resembles our directory file and we choose it, we concatenate the address
Img 4. Pdf Search

Now that we have the file path to search, we read it and call the text to speech service of ibm Watson to have our story interpreted by a voice.
Img 5. Open pdf
To start, we initialize the class and call the recording methods, search for stories and open the PDF. os.system allows us to open a file from the system, in this case it will open the file with extension wab.
Img 6. Main 
The result of read pdf
Img 7. Result

Conclusion:

Ibm watson services help us reduce code lines, allowing the creation of lighter and more robust software of higher quality



Resources

https://www.youtube.com/watch?v=VaSOKWsyeV8

Comentarios