Excerpt
I've written several articles , Write something that few people write today, but it's very useful ! Remember to like the collection and pay attention to it .
## Preface :Json Data is introduced
Json Is a widely used format for transmitting and exchanging data , It is applied in the database , It's also used for API Request result dataset . Although it is widely used , The machine is easy to read and saves space , But it is not conducive to reading and further data analysis , Therefore, it is usually necessary to obtain json After the data , Convert it to tabular data , To facilitate people to read and understand . common Json The data format is 2 Kind of , Data is stored in the form of key value pairs , It's just that the method of packaging data is different :
### a. commonly JSON object
use {} Enclose the key value pair data , Sometimes there are multiple layers {}

### b. JSON The object list
use [] take JSON object Cover up , To form a JSON List of objects ,JSON There will also be multiple layers in the object {}, There will be [] appear , formation Nested list

This article is mainly about pandas Built in Json Data conversion methods json_normalize(), It can be used for the above two Json Format data to parse , The resulting DataFrame, Then do more operations on the data . The main deconstruction of this paper is as follows :
1. Analyze the most basic Json
2. Parse a with multiple layers of data Json
3. Resolve a with nested lists Json
4. When Key How to ignore the system error when it does not exist
5. Use sep Parameters are nested Json Of Key Set separator
6. Prefix nested list data and metadata
7. adopt URL obtain Json And analyze the data
8. To explore the : Resolution with Multiple nested lists Of Json
### json_normalize() Function parameters
Before the code demonstration, import the corresponding dependent Libraries , Not installed pandas Please install the library by yourself ( This code is in Jupyter Notebook Running in the environment ).
```plain text
from pandas import json_normalize
import pandas as pd
Copy code
```
## 1. Analyze the most basic Json
### a. Parsing general Json object
```plain text
a_dict = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2
}
pd.json_normalize(a_dict)
Copy code
```
The output is :

### b. Analyze a Json The object list
```plain text
json_list = [
{'class': 'Year 1', 'student number': 20, 'room': 'Yellow'},
{'class': 'Year 2', 'student number': 25, 'room': 'Blue'}
]
pd.json_normalize(json_list)
Copy code
```
The output is :

## 2. Parse a with multiple layers of data Json
### a. Parse a with multiple layers of data Json object
```plain text
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': 'admission@abc.com',
'general': 'info@abc.com'
},
'tel': '123456789',
}
}
}
pd.json_normalize(json_obj)
Copy code
```
The output is :

Multi-storey key Use points to separate , Shows all the data , This has been resolved 3 layer , The above writing is similar to pd.json_normalize(json_obj, max_level=3) Equivalent .
If you set max_level=1, The output result is as shown in the figure below ,contacts Part of the data collection is integrated into a column

If you set max_level=2, The output result is as shown in the figure below ,contacts Under the email Part of the data collection is integrated into a column

### b. Parse a with multiple layers of data Json The object list
```plain text
json_list = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
}
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
}
}
]
pd.json_normalize(json_list)
Copy code
```
The output is :
If you separate max_level Set to 2 and 3, What should the output results be ? Please try it yourself ~
## 3. Resolve a with nested lists Json
```plain text
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': 'admission@abc.com',
'general': 'info@abc.com'
},
'tel': '123456789',
}
},
'students': [
{'name': 'Tom'},
{'name': 'James'},
{'name': 'Jacqueline'}
],
}
pd.json_normalize(json_obj)
Copy code
```
In this case students The value corresponding to the key is a list , Use [] Cover up . The above method is directly used for analysis , The results are as follows :
students Part of the data was not successfully parsed , It can be record_path Set the value , The call mode is pd.json_normalize(json_obj, record_path='students'), In this calling mode , The results obtained only include name Part of the data .
To add information for other fields , It needs to be meta parameter assignment , For example, under the following call mode , The result is as follows :
```plain text
pd.json_normalize(json_obj, record_path='students', meta=['school', 'location', ['info', 'contacts', 'tel'], ['info', 'contacts', 'email', 'general']])
Copy code
```
## 4. When Key How to ignore the system error when it does not exist
```plain text
data = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask',
}
},
'students': [
{ 'name': 'Tom', 'sex': 'M' },
{ 'name': 'James', 'sex': 'M' },
]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
# no math teacher
'physics': 'Albert Einstein'
}
},
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
},
]
pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']]
)
Copy code
```
stay class be equal to Year 2 Of Json In the object ,teachers Under the math The key doesn't exist , Running the above code directly will report the following error , Tips math Keys don't always exist , And the corresponding suggestions are given :Try running with errors='ignore'.
add to errors After the condition , The results of re running are shown in the figure below , No, math The part of the key uses NaN Filled .
```plain text
pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
errors='ignore'
)
Copy code
```
## 5. Use sep Parameters are nested Json Of Key Set separator
stay 2.a Case study , It can be noted that the output result has multiple layers key The header of the data column is . To multilayer key Separating , It can be for sep Assign a value to change the separator .
```plain text
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': 'admission@abc.com',
'general': 'info@abc.com'
},
'tel': '123456789',
}
}
}
pd.json_normalize(json_obj, sep='->')
Copy code
```
The output is :
## 6. Prefix nested list data and metadata
stay 3 In the output of example , Each column name has no prefix , for example name I don't know if this column is the data obtained by metadata parsing , Or through student Data from nested lists , Therefore record_prefix and meta_prefix The parameters are assigned respectively , You can add the corresponding prefix to the output result .
```plain text
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': 'admission@abc.com',
'general': 'info@abc.com'
},
'tel': '123456789',
}
},
'students': [
{'name': 'Tom'},
{'name': 'James'},
{'name': 'Jacqueline'}
],
}
pd.json_normalize(json_obj, record_path='students',
meta=['school', 'location', ['info', 'contacts', 'tel'], ['info', 'contacts', 'email', 'general']],
record_prefix='students->',
meta_prefix='meta->',
sep='->')
Copy code
```
In this case , Add... To nested list data students-> Prefix , Add... For metadata meta-> Prefix , Will be nested key Change the separator between to ->, The output is :
## 7. adopt URL obtain Json And analyze the data
adopt URL Getting data requires requests library , Please install the corresponding library by yourself .
```plain text
import requests
from pandas import json_normalize
# Through the weather API, Get Shenzhen near 7 Days of the weather
url = 'https://tianqiapi.com/free/week'
# Pass in url, And set the corresponding params
r = requests.get(url, params={"appid":"59257444", "appsecret":"uULlTGV9 ", 'city':' Shenzhen '})
# Convert the obtained value to json object
result = r.json()
df = json_normalize(result, meta=['city', 'cityid', 'update_time'], record_path=['data'])
df
Copy code
```
result The results are as follows , among data For a nested list :
```plain text
{'cityid': '101280601',
'city': ' Shenzhen ',
'update_time': '2021-08-09 06:39:49',
'data': [{'date': '2021-08-09',
'wea': ' Moderate rain to thunderstorm ',
'wea_img': 'yu',
'tem_day': '32',
'tem_night': '26',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-10',
'wea': ' thunder shower ',
'wea_img': 'yu',
'tem_day': '32',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-11',
'wea': ' thunder shower ',
'wea_img': 'yu',
'tem_day': '31',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-12',
'wea': ' cloudy ',
'wea_img': 'yun',
'tem_day': '33',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-13',
'wea': ' cloudy ',
'wea_img': 'yun',
'tem_day': '33',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-14',
'wea': ' cloudy ',
'wea_img': 'yun',
'tem_day': '32',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '},
{'date': '2021-08-15',
'wea': ' cloudy ',
'wea_img': 'yun',
'tem_day': '32',
'tem_night': '27',
'win': ' No sustained wind direction ',
'win_speed': '<3 level '}]}
Copy code
```
The output result after parsing is :
## 8. To explore the : Resolution with Multiple nested lists Of Json
When one Json When there is more than one nested list in an object or object list ,record_path Cannot include all nested lists , Because it can only receive one key value . here , We need to start with... Based on multiple nested lists key take Json It's resolved into multiple DataFrame, And I'll put these DataFrame Spliced according to the actual correlation conditions , And remove duplicates .
```plain text
json_obj = {
'school': 'ABC primary school',
'location': 'shenzhen',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': 'admission@abc.com',
'general': 'info@abc.com'
},
'tel': '123456789',
}
},
'students': [
{'name': 'Tom'},
{'name': 'James'},
{'name': 'Jacqueline'}
],
# add to university Nested list , add students, The JSON There are... In the object 2 A nested list
'university': [
{'university_name': 'HongKong university shenzhen'},
{'university_name': 'zhongshan university shenzhen'},
{'university_name': 'shenzhen university'}
],
}
# Try to record_path Write the names of two nested lists in , namely record_path = ['students', 'university], The result is of no avail
# So I decided to analyze it twice , Separately record_path Set to university and students, In the end 2 Results combined
df1 = pd.json_normalize(json_obj, record_path=['university'],
meta=['school', 'location', ['info', 'contacts', 'tel'],
['info', 'contacts', 'email', 'general']],
record_prefix='university->',
meta_prefix='meta->',
sep='->')
df2 = pd.json_normalize(json_obj, record_path=['students'],
meta=['school', 'location', ['info', 'contacts', 'tel'],
['info', 'contacts', 'email', 'general']],
record_prefix='students->',
meta_prefix='meta->',
sep='->')
# According to index Associate and remove duplicate Columns
df1.merge(df2, how='left', left_index=True, right_index=True, suffixes=['->', '->']).T.drop_duplicates().T
Copy code
```
The output is :
The part marked in the red box on the way is Json Object .
## summary
json_normalize() Method is extremely powerful , It covers almost all parsing JSON Scene , When it comes to more complex scenes , Can give the existing functions for divergent Integration , for example 8. To explore the The same thing happened in .
With this powerful Json Parsing library , I'll never be afraid to encounter complex Json Data. !
UP Hard sorting , Don't like the collection and pay attention !
> 本文为[Astringent orange]所创,转载请带上原文链接,感谢 https://pythonmana.com/2021/08/20210809141122207S.html