Sockets are used in networking. The idea of a socket is to aid in the communication between two entities. When you view a website, you are opening a port and connecting to that website via sockets. In this, you are the client, and the website is the server. Quite literally, you are served data.
What are Ports and what are Sockets?
A natural point of confusion here is the difference between sockets and ports. You can think of a port much like a shipping port, where boats dock at the port and unload goods. Then, you can think of the ship itself as the socket. The ocean is the internet. Much like shipping ports, a socket (our ship in this metaphor), is bound by a specific port. Docking at a different port is not allowed, for ships or sockets!
Now, let's go ahead and play with ports and sockets in Python! This can be a slightly confusing topic, so I will do my best to document everything. The video should help as well if you are finding yourself confused.
import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) print(s)
So, we must import socket to use it. This is an included module with your Python 3 distribution
Next, "s" here is being returned a "socket descriptor" by socket.socket. We then print "s" to show what this looks like.
Generally, we use sockets to communicate between a couple of places, so let's show an example of that. One of the most common transmissions of data is between a "client" and "server," most often in the case of a user visiting a website and being served web-content, much like you are being served this page right now. Sockets did that for you.
server = 'reddit.com' port = 80 server_ip = socket.gethostbyname(server) print(server_ip)
Many public websites will have port 80 open, which is for HTTP access. Most websites will have port 22 open, which is for SSH (secure shell), and many will have 20 and 21 open, which are used for FTP (File Transfer Protocol). If a website uses HTTPS, then port 443 will be open as well. Sometimes, this is going to be required, like with this website, HTTPS is forced, you can't use regular HTTP.
Here is some more information on open ports and hacking: Open Ports and Hacking
Do open ports mean you are going to be hacked?
It is a common misconception, perpetuated by the media, that an "open port" is all one needs to "hack" a something. The truth is, all websites have open ports, but each port is expecting a specific socket (ship in our metaphor from before), and that specific socket's type of payload of data (ship's cargo) is also known and expected before-hand.
Thus, in our metaphor, if we have a ship that is supposed to be bringing 50 crates full of coffee, but has instead brought over 50 crates of swordfish, immediate red flags are thrown. The same is true with sockets and ports. The socket / ship can be denied.
Then how do hackers get in?
The way sockets and ports are abused by hackers is by taking advantage of vulnerabilities in the programs that have opened specific ports. Every program that uses the internet to provide you a service uses ports, and opens them to the world. Take Skype for example. Skype uses ports 80 and 443. You already know what port 80 is for. 443 is for other types of connections besides port 80's HTTP connections. Via port 443, Skype is expecting a certain type of data, but maybe their security is not perfect, and people are able to use port 443 maliciously because Skype's protocol is not perfectly secure.
Thus, what hackers tend to do, is scan open ports. From the open ports, many times, they can deduce what programs you are running, and proceed to try various attacks against that program's vulnerabilities, especially the historical ones that are generally made public. This is why it is important to keep your software up to date. Most software updates contain security upgrades, fixes, or patches. Even if not specifically explained, the very act of patching an area of code can alert someone that there was something weak there before.
So, above, we were able to access reddit.com via port 80. From there, we were able to determine the server's IP address by using gethostbyname().
Now, let's make a request, making sure it is in-line with what the port will find acceptable from our socket:
request = "GET / HTTP/1.1\nHost: "+server+"\n\n" s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(("reddit.com", 80))
Above, we defined our request as an HTTP request, where we wanted to "GET" data from the "Host" of PythonProgramming.net
Next, we defined our socket in the same manner as we had before.
Finally, we make our connection to PythonProgramming.net on port 80. This is just a connection. We have defined out request, but not actually made any request, so let's make the request:
s.send(request.encode()) result = s.recv(4096) print(result)
First we're sending the request, and encoding it.
Then we're using s.recv to receive the resulting data. The 4096 is a buffer for the data, so that you receive the data in manageable chunks rather than all at once. Finally, we're just printing the result (Though it should be noted this is printing only the first part of the buffer, so the buffer in this case is almost a waste.)
With Python 3, one of the major changes from Python 2 was the differing treatment of strings and bytes. If you want to make a request that is a string, you need to encode it. You will also need to decode any return that you wish to treat like a string. You should just get into the habit mentally that everything you send out over the internet needs to be encoded, and all that you receive needs a .decode, every time! Python 2 implicitly handled this for us. Python 3 requires us to be explicit, which is more Pythonic anyways.
One of the main pillars of Python is that "explicit is better than implicit. If you have not yet, open a console, and do the following import:
import this
Since I said the buffer was almost a waste, I should probably show how to make the output actually buffer as well. Here's how:
Instead of using print(result), comment or delete that, then do:
while (len(result) > 0): print(result) result = s.recv(4096)
If you wanted to do this with an HTTPS forcing website, such as PythonProgramming.net, you would instead do something like:
import socket import ssl context = ssl.SSLContext(ssl.PROTOCOL_TLSv1) context.verify_mode = ssl.CERT_REQUIRED context.check_hostname = True context.load_default_certs() server = "pythonprogramming.net" port = 443 server_ip = socket.gethostbyname(server) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s = context.wrap_socket(s, server_hostname=server) request = "GET / HTTP/1.1\nHost: "+server+"\n\n" s.connect((server,port)) s.send(request.encode()) result = s.recv(4096) while (len(result) > 0): print(result) result = s.recv(4096)